Speaker: Thuylinh Nguyen
Title: Nonparametric Word Segmentation for Machine Translation
Thursday 18 Feb 2010. 12-1:30pm in GHC 4405.
In this talk we present an unsupervised word segmentation for machine
translation. The model utilizes existing nonparametric monolingual
segmentations. The monolingual segmentation model and the bilingual word
alignment model are coupled so that source text segmentation optimizes
the one-to-one mapping with the target text. Often, there are words in
the source language that do not appear in target language and vise
versa. Our model therefore models source language word deletion and word
insertion. The experiments show improvements on Arabic-English and
Chinese-English translation tasks.