Monday, August 10, 2009

Two talks

Talk 1:
Nguyen Bach: Source-side Dependency Tree Reordering Models with Subtree Movements and Constraints

Abstract: We propose a novel source-side dependency tree reordering model for statistical machine translation, in which subtree movements and constraints are represented as reordering events associated with the widely used lexicalized reordering models. This model allows us to not only efficiently capture the statistical distribution of the subtree-to-subtree transitions in training data, but also utilize it directly at the decoding time to guide the search process. Using subtree movements and constraints as features in a log-linear model, we are able to help the reordering models make better selections. It also allows the subtle importance of monolingual syntactic movements to be learned alongside other reordering features. We show improvements in translation quality in English-Spanish and English-Iraqi translation tasks.

This is joint work with Qin Gao and Stephan Vogel.

Talk 2:
Francisco (Paco) Guzman: Reassessment of the Role of Phrase Extraction in SMT

Abstract: In this paper we study in detail the relation between word alignment and phrase extraction. First, we analyze different word alignments according to several characteristics and compare them to hand-aligned data. Then, we analyze the phrase-pairs generated by these alignments. We observed that the number of unaligned words has a large impact on the characteristics of the phrase table. A manual evaluation of phrase pair quality showed that the increase in the number of unaligned words results in a lower quality. Finally, we present translation results from using the number of unaligned words as features from which we obtain up to 2BP of improvement.

This is joint work with Qin Gao and Stephan Vogel.