Tuesday, July 15, 2008

Improving Lexical Coverage of Syntax-driven MT by Re-structuring Non-isomorphic Trees

Speaker: Vamshi Ambati
Date: 15 July 2008

Syntax-based approaches to statistical MT require syntax-aware methods for acquiring their underlying translation models from parallel data. This acquisition process can be driven by syntactic trees for either the source or target language, or by trees on both sides. Work to date has demonstrated that using trees for both sides suffers from severe coverage problems. Approaches that project from trees on one side, on the other hand, have higher levels of recall, but suffer from lower precision, due to the lack of syntactically-aware word alignments.

In this talk I first discuss extraction and the lexical coverage of the translation models learned in both of these scenarios. We will specifically look at how the non-isomorphic nature of the parse trees for the two languages effects recall and coverage. I will then discuss a novel technique for restructuring target parse trees, that generates highly isomorphic target trees that preserve the syntactic boundaries of constituents that were aligned in the original parse trees. I will conclude by discussing some experimental evaluation with an English-French MT System.

No comments: