Monday, June 15, 2009

Making Disfluent Output Slightly Less So: MT System Combination Search Spaces and Optimization

Speaker: Kenneth Heafield

Title: Making Disfluent Output Slightly Less So:
MT System Combination Search Spaces and Optimization

Abstract: System combination merges several machine translation outputs
into a single improved sentence. This talk starts by summarizing the
approach including, a search space derived from the alignments, and
hypothesis scoring. The current search space focuses on picking words
in a roughly word synchronous way. Another search space under development
builds a directed graph in which aligned words correspond to a vertex and
each bigram corresponds to a directed edge. Search is conducted much like
a left-to-right MT decoder. Speed optimizations, which allow decoding at
5.5 sentences per second, apply to other MT systems in the areas of
duplicate handling, language model state, and multithreading. This speed
allows me to find hyperparameters by searching hundreds of parameter
combinations, each with a full round of tuning. In preparation for
last Friday's NIST submission, system combination improved 2.4 BLEU
points over the best component system for Urdu to English translation.