Monday, May 16, 2011

Syntax-to-Morphology Mapping in Factored Phrase-Based SMT (English and Turkish)

Title: Syntax-to-Morphology Mapping in Factored Phrase-Based
Statistical Machine Translation between English and Turkish

Speaker: Reyyan Yeniterzi

When: Tuesday, May 17 at 12:15pm
Where: GHC 6501

Abstract:

Motivated by the observation that many local and some nonlocal
syntactic structures in English essentially map to morphologically
complex words in Turkish, a new approach which is called
syntax-to-morphology mapping was introduced recently (Yeniterzi and
Oflazer, 2010). This approach maps syntactic structures in English to
complex words in Turkish directly. It mainly recognizes certain local
and nonlocal syntactic structures on the English side and packages
those structures and attach to heads to obtain parallel morphological
structures.

With the help of this method, one can identify and reorganize phrases
on the English side, to align English syntax to Turkish morphology.
Furthermore with this method, continuous and discontinuous variants of
certain (syntactic) source phrases can be conflated during the SMT
phrase extraction process. Since most function words encoding syntax
are now abstracted into complex tags, the length of the English
sentences can be dramatically reduced.

The initial experiments were performed on English-to-Turkish SMT
system. In this project, we built upon this initial system by doing
lexical reordering and data augmentation. Furthermore we also applied
syntax-to-morphology mapping to a Turkish-to-English SMT system for
the first time.

This is joint work with Kemal Oflazer from Qatar CMU. It was presented
in the Machine Translation and Morphologically-rich Languages Research
Workshop at Haifa, Israel in January, 2011.

No comments: