Wednesday, September 7, 2011

Training Machine Translation with a Second-Order Taylor Approximation of Weighted Translation Instances

Title: Training Machine Translation with a Second-Order Taylor Approximation of Weighted Translation Instances
Speaker: Aaron Phillips
When: Tuesday, September 13, 12:00 Noon to 1:00pm.
Where: GHC 6501

Abstract: The Cunei Machine Translation Platform is an open-source MT system designed to model instances of translation. One of the challenges to this approach is effective training. We describe two techniques that improve the training procedure and allow us to leverage the strengths of instance-based modeling. First, during training we approximate our model with a second-order Taylor series. Second, we discount models based on the magnitude of their approximation. By reducing error in training, our model now consistently outperforms the standard SMT model with gains ranging from 0.51 to 3.77 BLEU on German-English and Czech-English test sets.

Monday, May 16, 2011

Syntax-to-Morphology Mapping in Factored Phrase-Based SMT (English and Turkish)

Title: Syntax-to-Morphology Mapping in Factored Phrase-Based
Statistical Machine Translation between English and Turkish

Speaker: Reyyan Yeniterzi

When: Tuesday, May 17 at 12:15pm
Where: GHC 6501

Abstract:

Motivated by the observation that many local and some nonlocal
syntactic structures in English essentially map to morphologically
complex words in Turkish, a new approach which is called
syntax-to-morphology mapping was introduced recently (Yeniterzi and
Oflazer, 2010). This approach maps syntactic structures in English to
complex words in Turkish directly. It mainly recognizes certain local
and nonlocal syntactic structures on the English side and packages
those structures and attach to heads to obtain parallel morphological
structures.

With the help of this method, one can identify and reorganize phrases
on the English side, to align English syntax to Turkish morphology.
Furthermore with this method, continuous and discontinuous variants of
certain (syntactic) source phrases can be conflated during the SMT
phrase extraction process. Since most function words encoding syntax
are now abstracted into complex tags, the length of the English
sentences can be dramatically reduced.

The initial experiments were performed on English-to-Turkish SMT
system. In this project, we built upon this initial system by doing
lexical reordering and data augmentation. Furthermore we also applied
syntax-to-morphology mapping to a Turkish-to-English SMT system for
the first time.

This is joint work with Kemal Oflazer from Qatar CMU. It was presented
in the Machine Translation and Morphologically-rich Languages Research
Workshop at Haifa, Israel in January, 2011.

Tuesday, May 3, 2011

Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability

Title: Better Hypothesis Testing for Statistical Machine Translation:
Controlling for Optimizer Instability
Speaker: Jonathan Clark
When: Tuesday, 4/19 at Noon

Abstract:
In statistical machine translation, a researcher seeks to determine
whether some innovation (e.g., a new feature, model, or inference
algorithm) improves translation quality in comparison to a baseline
system. To answer this question, he runs an experiment to evaluate the
behavior of the two systems on held-out data. In this paper, we
consider how to make such experiments more statistically reliable. We
provide a systematic analysis of the effects of optimizer instability
(an extraneous variable that is seldom controlled for) on experimental
outcomes, and make recommendations for reporting results more
accurately.

This is joint work with Chris Dyer, Alon Lavie, and Noah Smith. It was
recently accepted for publication as an ACL short paper.

Wednesday, March 2, 2011

Qin Gao: Expanding parallel corpora for machine translation

Speaker: Qin Gao
When: at noon, March 8, 2011
Where: GHC 4405

We present an approach of expanding parallel corpora for machine translation. By utilizing Semantic role labeling (SRL) on one side of the language pair, we extract SRL substitution rules from existing parallel corpus. The rules are then used for generating new sentence pairs. An SVM classifier is built to filter the generated sentence pairs. The filtered corpus is used for training phrase-based translation models, which can be used directly in translation tasks or combined with baseline models. Experiment results on Chinese-English machine translation tasks show an average improvement of 0.45 BLEU and 1.22 TER points across 5 different NIST test sets.

Thursday, January 13, 2011

Machine Translation and Computer-Assisted Translation

Title: Prospects for Integrating Machine Translation and Computer-Assisted Translation in the Translation Industry

Speaker: Gregory M. Shreve from the Department of Modern and Classical Language Studies at Kent State University and colleagues
Location: GHC 6115
Time: 12:30 pm, 14 Jan 2011


The speaker's CV can be found at http://www.kent.edu/mcls/faculty/mcls_shreve.cfm.