Thursday, March 12, 2009

Moving Beyond Phrase-Pairs: Dynamically Scoring Collections of Translation Examples

Moving Beyond Phrase-Pairs: Dynamically Scoring Collections of
Translation Examples

Who: Aaron B. Phillips
When: Friday Mar 13, 12:00pm
Where: NSH 1507

Abstract:

Statistical Machine Translation has prospered because it is based on
models that are consistent and straightforward to optimize. The
log-linear model in particular allows the researcher to exploit
numerous, possibly dependent, features. However, the modeling approach
taken by SMT enforces a particular top-down view of the data using
phrase-pairs that does not easily allow for the integration of features
that may change from example to example. What I propose is a shift in
how the model is built. Inspired by Example-Based Machine Translation, I
calculate features for each example separately, but like SMT this
information is collected into a single log-linear model that is
straightforward to optimize. This is accomplished by identifying at
run-time the most appropriate collection of translation examples instead
of using precomputed phrase-pairs. A search is performed over each
example-specific feature such as the alignment quality, genre, or
context to determine a collection that maximizes the score. The weights
for each example-specific feature are adjustable during optimization and
allow for a trade-off between forming collections over all the examples
and forming collections that consist of a few high-quality examples.
This framework seeks to unify the approaches of EBMT and SMT. It results
in a model that is straightforward to optimize *and* allows the
integration of novel example-specific features.