MT Lunch Seminar (LTI, CMU)

Tuesday, August 14, 2007

Experiments with a Noun-Phrase driven Statistical Machine Translation System

Title: Experiments with a Noun-Phrase driven Statistical Machine
Translation System

Speaker: Sanjika Hewavitharana
Date: Aug 14, 2007
Time: 12:00pm
Place: NSH 3305 [During MT Group Monthly Lunch]

Abstract:

Hierarchical translation models that use phrases with words as well as
sub-phrases have shown better performance than standard phrase based
systems. In this talk I will present a noun-phrase (NP) driven
statistical machine translation system. Using noun-phrases as the
decomposition unit, we build a two-level hierarchy of phrases. We first
identify noun-phrases in the data and replace them with a tag to produce
an NP tagged corpus. This corpus is then used to extract NP-tagged
phrase translation pairs. Both noun-phrases and NP-tagged phrases are
used in a two-level translation decoder. The two-level system shows
significant improvements over a baseline phrase-based SMT system. It
also produces longer matching phrases due to the generalization
introduced by tagging noun-phrases.

Tuesday, May 22, 2007

In Search of Better MT Evaluation Metric : Some Experiments

Date: May 22, 2007
Presenter: Abhaya Agarwal
Title: In Search of Better MT Evaluation Metric : Some Experiments

Abstract: Area of automatic metrics for MT Evaluation has seen a lot of
activity in last 4-5 years. Starting with BLEU, many such metrics have
been proposed over the years including METEOR, HTER and ROUGE. While
these metrics achieve good correlation with human judgments at the
system level, situation remains bleak at individual sentence level. In
this talk, I will talk about some work that we have been doing towards
developing metrics with improved correlation with human judgments at the
sentence level. I will present the current results and some possible
future directions.

Tuesday, April 17, 2007

An Assessment of Language Elicitation without the Supervision of a Linguist

Date: Apr 17, 2007
Presenter: Lori Levin and Alison Alvarez
Title: An Assessment of Language Elicitation without the Supervision of a Linguist

We created an elicitation corpus designed to elicit the morphosyntactic
features of a target language without the supervision of a linguist.
The corpus is composed of approximately 3200 English source sentences
that are then translated by a native speaker into the target language.
The design of our corpus was driven by our need to elicit morphosyntactic
language features without the supervision of a linguist. In a previous
paper we reported on a reverse Treebank and that was a deep
morphosyntactic tree with two parallel human language sentences. The
first is provided by reverse annotation and the second is acquired
through elicitation. This presentation will focus on the extent to which
we able to acquire our morphosyntactic information from our translated
corpora and the types of errors we encountered, both from the perspective
of the translator and the corpus itself.

Tuesday, February 20, 2007

Translation Model Pruning

Date: Feb 20, 2007

Presenter: Matthias Eck

Title: Translation Model Pruning

Tuesday, January 16, 2007

SALM: Suffix Array and its Applications in Empirical Language Processing

Date: Jan 16, 2007

Presenter: Ying(Joy) Zhang

Title: SALM: Suffix Array and its Applications in Empirical Language Processing

Tuesday, November 21, 2006

Simulating Multiple Translations and ASR Transcripts for Applications in Multilingual Spoken Document Classification

Title: Simulating Multiple Translations and ASR Transcripts for Applications in Multilingual Spoken Document Classification

Speaker: Wei-Hao Lin from the Informedia group

Abstract:
We propose a statistical model to simulate multiple documents and
their translations (e.g. Chinese documents and their English
translations), and apply the model in the task of classifying
multilingual documents. The model, based on a frequency matching
principle, predicts that previous approaches to building classifiers
from a common language (e.g., English) are not optimal for
multilingual collections with unbalanced numbers of documents, and a
proposed multilingual representation can outperform the mono-lingual
bag-of-words representation. We also investigate the possibility of
combining multiple ASR transcripts and translations through
re-weighting. The validity of our model is strongly supported by
the close match between predictions of the simulation model and the
empirical results of classifying multilingual spoken documents from
broadcast news in three languages.

Tuesday, October 17, 2006

Coupling of ASR+MT: Initial Experiments & Future Directions

Speaker: Ian Lane

Title: Tighter Coupling of ASR+MT: Initial Experiments & Future Directions

Abstract:
In this talk, I will first give a brief overview of my PhD work entitled "Flexible Spoken Language Understanding based on Topic Classification and Domain Detection", and describe how the proposed approaches can be applied to applications other than speech-to-speech translation. I will
then describe my current work which focuses on improving coupling between ASR and Machine-Translation Systems, specifically, when applied to conversational speech. Finally, I will propose future directions for which I hope to receive a large amount of feedback.