Monday, September 8, 2008

Bilingual-LSA based adaptation for statistical machine translation

Date: 9 Sept 2008
Speaker: Wilson Tam
TITLE: Bilingual-LSA based adaptation for statistical machine translation

ABSTRACT:
We propose a novel approach to crosslingual language model (LM) and translation lexicon adaptation for statistical machine translation based on bilingual Latent Semantic Analysis (bLSA). bLSA enables latent topic distributions to be efficiently transferred across languages by enforcing a one-to-one topic correspondence during training. Using the proposed bLSA framework, model adaptation can be performed by, first, inferring the topic posterior distribution of the source text and then applying the inferred distribution to an N-gram LM of the target language and translation lexicon via marginal adaptation. The background phrase table is then enhanced with the additional phrase scores computed using the adapted translation lexicon.

The proposed framework also features rapid bootstrapping of LSA models for new languages based on a source LSA model of another language. Our approach was evaluated on the Chinese-to-English MT06 test set. Improvement in BLEU was observed when the adapted LM and the adapted translation lexicon were applied individually. When the adapted LM and the adapted lexicon were applied simultaneously, the gain in BLEU was additive yielding 28.91% in BLEU which is statistically significant at the 95% confidence interval with respect to the unadapted baseline with 28.06% in BLEU.