Abstract: ----------- Example-Based Machine Translation (EBMT), like other corpus based methods, requires substantial parallel training data. One way to reduce data requirements and improve translation quality is to generalize parts of the parallel corpus into translation templates. This automated generalization process requires clustering. In most clustering approaches the optimal number of clusters (N) is found empirically on a development set which often takes several days. We introduce a spectral clustering framework that automatically estimates the optimal N and removes unstable oscillating points. The new framework produces significant improvements in low-resource EBMT settings for English-to-French (~1.4 BLEU points), English-to-Chinese (~1 BLEU point), and English-to-Haitian (~2 BLEU points). The translation quality with templates created using automatically and empirically found best N were almost the same. By discarding “incoherent” points, a further boost in translation scores is observed, even above the empirically found N.
MT Lunch Seminar Series is an informal discussion group where researchers in the area of Machine Translation present their research and seek feedback from the MT groups at CMU. Talks are scheduled for the 2nd Tuesday of the month at NOON in GHC 4405, unless otherwise mentioned.