Wednesday, January 13, 2010

LoonyBin: Making Empirical MT Reproducible, Efficient, and Less Annoying

Speaker: Jonathan Clark
When: Tuesday, January 19 at Noon
Where: GHC 6501
What: Free Knowledge and Free Food
Title: LoonyBin: Making Empirical MT Reproducible, Efficient, and
Less Annoying

Abstract: Construction of machine translation systems has evolved into
a multi-stage workflow involving many complicated dependencies. Many
decoder distributions have addressed this by including monolithic
training scripts – train-factored-model.pl for Moses and mr_runmer.pl
for SAMT. However, such scripts can be tricky to modify for novel
experiments and typically have limited support for the variety of job
schedulers found on academic and commercial computer clusters. Further
complicating these systems are hyperparameters, which often cannot be
directly optimized by conventional methods requiring users to
determine which combination of values is best via trial and error. The
recently-released LoonyBin open-source workflow management tool
addresses these issues by providing: 1) a visual interface for the
user to create and modify workflows; 2) a well-defined logging
mechanism; 3) a script generator that compiles visual workflows into
shell scripts, and 4) the concept of Hyperworkflows, which intuitively
and succinctly encodes small experimental variations within a larger
workflow. We also describe the Machine Translation Toolpack for
LoonyBin, which exposes state-of-the-art machine translation tools as
drag-and-drop components within LoonyBin.

No comments: