Monday, March 15, 2010

Two talks

(1) Greg Hanneman:
Title: The Stat-XFER Group Submission for WMT '10

Each year, the Workshop in Statistical Machine Translation collects state-of-the-art MT results for a variety of European language pairs via a shared translation task. In this talk, I will describe the CMU's Stat-XFER MT group submission to this year's WMT French--English track, our third submission to the WMT series, using the Joshua decoder. A large focus will be on new modeling decisions or system-building techniques that have changed from eariler submissions based on new research carried out in our group. I will also present some open questions facing builders of large-scale hierarchcial MT systems in general.

(2) Vamshi Ambati:
Title: Making sense of Crowd data for Machine Translation

Quality of crowd data is a common concern in crowd-sourcing approaches to data collection. When working with crowd data, the objectives are two-fold - maximizing the quality of data from non-experts, and minimizing the cost of annotation by pruning noisy annotators.
I will discuss our recent experiments in Machine Translation for selection of high quality crowd translations by explicitly modeling annotator reliability based on agreement with other submissions. I will also present some preliminary results in cost minimization and report their adaptation and feasibility to machine translation.

