MT Lunch Seminar (LTI, CMU)

Tuesday, May 13, 2008

Statistical Transfer MT Systems for French and German

Speaker: Greg Hanneman

Date: Tuesday, May 13, at noon
Location: Wean Hall 4623

Title: Statistical Transfer MT Systems for French and German

Abstract: The AVENUE research group's statistical transfer system is a general framework for creating hybrid machine translation systems. It uses two main resources: a weighted synchronous context-free grammar, and a probabilistic bilingual lexicon of syntax-based word- and phrase-level translations. Over the last six months, we have developed new methods for extracting these resources automatically from parsed and word-aligned parallel corpora. In this talk, I will describe the resource-extraction process as it was applied to new French--English and German--English systems for the upcoming ACL workshop on statistical machine translation. Preliminary evaluation results --- both automatic and human-assessed --- will also be reviewed.

Tuesday, April 22, 2008

Simulating Sentence Pairs Sampling Process via Source and Target Language Models

Speaker: Ngyuen Bach

Abstract: In a traditional word alignment process, each sentence pair is equally assigned an occurrence number, which is normalized during the training to produce the empirical probability. However, some sentences could be more valuable, reliable and appropriate than others. These sentences should therefore have a higher weight in the training. To solve this problem, we explored methods of resampling sentence pairs. We investigated three sets of features: sentence pair confidence (/sc/), genre-dependent sentence pair confidence (/gdsc/) and sentence-dependent phrase alignment confidence (/sdpc/) scores. These features were calculated over an entire training corpus and could easily be integrated into the phrase-based machine translation system.

Wednesday, March 19, 2008

Communicating Unknown Words in Machine Translation

Speaker: Matthias Eck

Title: Communicating Unknown Words in Machine Translation

Abstract:
Unknown words are a major problem for every machine translation system. Regular evaluations and demos do not always show this very well, but in actual communication the lack of specialty vocabulary and named entity translations can seriously affect the communication ability.

A new approach is presented that uses monolingual encyclopedias and dictionaries to "communicate" unknown words. Instead of the actual unknown word, its definition is extracted and translated, which leads to considerable improvements in translation quality.

Tuesday, November 13, 2007

Trees that can help

Speaker: Alok Parlikar

Title: (S (NP (NP Trees) (SBAR (WHNP that) (S (VP can)))) (VP help))

Summary:

For the past two months, I have been working with Alon Lavie and Stephan
Vogel, on Chinese and English parse-trees, to investigate answers to the
following questions:

(a) Can constituency information and word level alignments be used to
align nodes in trees of parallel sentences? How precisely matched
(in meaning) are the yields of these aligned nodes?
(b) Can the parse trees and word-level alignments be used for learning
reordering rules? If we use these rules to reorder source sentences,
can we do any better at translation?

The current results show that:

(a) - Node Alignments from hand-aligned data are very precise.
- Using automatic word-alignments to align nodes gives over 70%
precision and over 40% recall.
(b) Using a 10-best reordering of words in the source sentences, with
a "dumb" reordering strategy has shown a 0.005 improvement in BLEU
score.

I would like to talk about the approaches that we have taken here, and to
discuss about strategies for improving these results.

Tuesday, October 9, 2007

Sub-Phrasal Matching and Structural Templates in Example-Based MT

Speaker: Aaron Phillips

Title: Sub-Phrasal Matching and Structural Templates in Example-Based MT

Example-Based Machine Translation (EBMT) encompasses many different
approaches to data-driven MT. In this work I first look at two different
paradigms of EBMT. I then combine the strengths of these two systems and
build a new engine that combines sub-phrasal matching with structural
templates. The end result is a melding of ideas from EBMT, SMT, and
Xfer. This synthesis results in higher translation quality and more
graceful degradation, yielding 1.5% to 7.5% relative improvement in BLEU
scores.

This work was recently presented at TMI. The full paper can be found
here:

http://dustoftheground.net/techne/research/sub-phrasal_matching_and_structural_templates_in_example-based_mt.pdf

Tuesday, August 14, 2007

Experiments with a Noun-Phrase driven Statistical Machine Translation System

Title: Experiments with a Noun-Phrase driven Statistical Machine
Translation System

Speaker: Sanjika Hewavitharana
Date: Aug 14, 2007
Time: 12:00pm
Place: NSH 3305 [During MT Group Monthly Lunch]

Abstract:

Hierarchical translation models that use phrases with words as well as
sub-phrases have shown better performance than standard phrase based
systems. In this talk I will present a noun-phrase (NP) driven
statistical machine translation system. Using noun-phrases as the
decomposition unit, we build a two-level hierarchy of phrases. We first
identify noun-phrases in the data and replace them with a tag to produce
an NP tagged corpus. This corpus is then used to extract NP-tagged
phrase translation pairs. Both noun-phrases and NP-tagged phrases are
used in a two-level translation decoder. The two-level system shows
significant improvements over a baseline phrase-based SMT system. It
also produces longer matching phrases due to the generalization
introduced by tagging noun-phrases.

Tuesday, May 22, 2007

In Search of Better MT Evaluation Metric : Some Experiments

Date: May 22, 2007
Presenter: Abhaya Agarwal
Title: In Search of Better MT Evaluation Metric : Some Experiments

Abstract: Area of automatic metrics for MT Evaluation has seen a lot of
activity in last 4-5 years. Starting with BLEU, many such metrics have
been proposed over the years including METEOR, HTER and ROUGE. While
these metrics achieve good correlation with human judgments at the
system level, situation remains bleak at individual sentence level. In
this talk, I will talk about some work that we have been doing towards
developing metrics with improved correlation with human judgments at the
sentence level. I will present the current results and some possible
future directions.