Monday, May 16, 2011

Syntax-to-Morphology Mapping in Factored Phrase-Based SMT (English and Turkish)

Title: Syntax-to-Morphology Mapping in Factored Phrase-Based
Statistical Machine Translation between English and Turkish

Speaker: Reyyan Yeniterzi

When: Tuesday, May 17 at 12:15pm
Where: GHC 6501

Abstract:

Motivated by the observation that many local and some nonlocal
syntactic structures in English essentially map to morphologically
complex words in Turkish, a new approach which is called
syntax-to-morphology mapping was introduced recently (Yeniterzi and
Oflazer, 2010). This approach maps syntactic structures in English to
complex words in Turkish directly. It mainly recognizes certain local
and nonlocal syntactic structures on the English side and packages
those structures and attach to heads to obtain parallel morphological
structures.

With the help of this method, one can identify and reorganize phrases
on the English side, to align English syntax to Turkish morphology.
Furthermore with this method, continuous and discontinuous variants of
certain (syntactic) source phrases can be conflated during the SMT
phrase extraction process. Since most function words encoding syntax
are now abstracted into complex tags, the length of the English
sentences can be dramatically reduced.

The initial experiments were performed on English-to-Turkish SMT
system. In this project, we built upon this initial system by doing
lexical reordering and data augmentation. Furthermore we also applied
syntax-to-morphology mapping to a Turkish-to-English SMT system for
the first time.

This is joint work with Kemal Oflazer from Qatar CMU. It was presented
in the Machine Translation and Morphologically-rich Languages Research
Workshop at Haifa, Israel in January, 2011.

Tuesday, May 3, 2011

Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability

Title: Better Hypothesis Testing for Statistical Machine Translation:
Controlling for Optimizer Instability
Speaker: Jonathan Clark
When: Tuesday, 4/19 at Noon

Abstract:
In statistical machine translation, a researcher seeks to determine
whether some innovation (e.g., a new feature, model, or inference
algorithm) improves translation quality in comparison to a baseline
system. To answer this question, he runs an experiment to evaluate the
behavior of the two systems on held-out data. In this paper, we
consider how to make such experiments more statistically reliable. We
provide a systematic analysis of the effects of optimizer instability
(an extraneous variable that is seldom controlled for) on experimental
outcomes, and make recommendations for reporting results more
accurately.

This is joint work with Chris Dyer, Alon Lavie, and Noah Smith. It was
recently accepted for publication as an ACL short paper.