MT Lunch Seminar (LTI, CMU): LM

Showing posts with label LM. Show all posts

Monday, September 8, 2008

Bilingual-LSA based adaptation for statistical machine translation

Date: 9 Sept 2008
Speaker: Wilson Tam
TITLE: Bilingual-LSA based adaptation for statistical machine translation

ABSTRACT:
We propose a novel approach to crosslingual language model (LM) and translation lexicon adaptation for statistical machine translation based on bilingual Latent Semantic Analysis (bLSA). bLSA enables latent topic distributions to be efficiently transferred across languages by enforcing a one-to-one topic correspondence during training. Using the proposed bLSA framework, model adaptation can be performed by, first, inferring the topic posterior distribution of the source text and then applying the inferred distribution to an N-gram LM of the target language and translation lexicon via marginal adaptation. The background phrase table is then enhanced with the additional phrase scores computed using the adapted translation lexicon.

The proposed framework also features rapid bootstrapping of LSA models for new languages based on a source LSA model of another language. Our approach was evaluated on the Chinese-to-English MT06 test set. Improvement in BLEU was observed when the adapted LM and the adapted translation lexicon were applied individually. When the adapted LM and the adapted lexicon were applied simultaneously, the gain in BLEU was additive yielding 28.91% in BLEU which is statistically significant at the 95% confidence interval with respect to the unadapted baseline with 28.06% in BLEU.

Tuesday, April 22, 2008

Simulating Sentence Pairs Sampling Process via Source and Target Language Models

Speaker: Ngyuen Bach

Abstract: In a traditional word alignment process, each sentence pair is equally assigned an occurrence number, which is normalized during the training to produce the empirical probability. However, some sentences could be more valuable, reliable and appropriate than others. These sentences should therefore have a higher weight in the training. To solve this problem, we explored methods of resampling sentence pairs. We investigated three sets of features: sentence pair confidence (/sc/), genre-dependent sentence pair confidence (/gdsc/) and sentence-dependent phrase alignment confidence (/sdpc/) scores. These features were calculated over an entire training corpus and could easily be integrated into the phrase-based machine translation system.

Tuesday, January 16, 2007

SALM: Suffix Array and its Applications in Empirical Language Processing

Date: Jan 16, 2007

Presenter: Ying(Joy) Zhang

Title: SALM: Suffix Array and its Applications in Empirical Language Processing

MT Lunch Seminar (LTI, CMU)

Monday, September 8, 2008

Bilingual-LSA based adaptation for statistical machine translation

Tuesday, April 22, 2008

Simulating Sentence Pairs Sampling Process via Source and Target Language Models

Tuesday, January 16, 2007

SALM: Suffix Array and its Applications in Empirical Language Processing

About

Organizers

Other links

Total Pageviews

Archive of Previous Talks

Topics