Tuesday, August 14, 2007

Experiments with a Noun-Phrase driven Statistical Machine Translation System

Title: Experiments with a Noun-Phrase driven Statistical Machine
Translation System

Speaker: Sanjika Hewavitharana
Date: Aug 14, 2007
Time: 12:00pm
Place: NSH 3305 [During MT Group Monthly Lunch]

Abstract:

Hierarchical translation models that use phrases with words as well as
sub-phrases have shown better performance than standard phrase based
systems. In this talk I will present a noun-phrase (NP) driven
statistical machine translation system. Using noun-phrases as the
decomposition unit, we build a two-level hierarchy of phrases. We first
identify noun-phrases in the data and replace them with a tag to produce
an NP tagged corpus. This corpus is then used to extract NP-tagged
phrase translation pairs. Both noun-phrases and NP-tagged phrases are
used in a two-level translation decoder. The two-level system shows
significant improvements over a baseline phrase-based SMT system. It
also produces longer matching phrases due to the generalization
introduced by tagging noun-phrases.