Monday, April 12, 2010

Generalized templates for EBMT

Speaker: Rashmi Gangadharaiah
Location: GHC 6501

Topic: Generalized templates for EBMT

Abstract:
-----------
Example-Based Machine Translation (EBMT), like other corpus based methods, requires substantial parallel training data. One way to reduce data requirements and improve translation quality is to generalize parts of the parallel corpus into translation templates. This automated generalization process requires clustering. In most clustering approaches the optimal number of clusters (N) is found empirically on a development set which often takes several days. We introduce a spectral clustering framework that automatically estimates the optimal N and removes unstable oscillating points. The new framework produces significant improvements in low-resource EBMT settings for English-to-French (~1.4 BLEU points), English-to-Chinese (~1 BLEU point), and English-to-Haitian (~2 BLEU points). The translation quality with templates created using automatically and empirically found best N were almost the same. By discarding “incoherent” points, a further boost in translation scores is observed, even above the empirically found N.