MT Lunch Seminar (LTI, CMU)

Tuesday, April 28, 2009

EBMT with external word alignment and chunk alignment

Title: EBMT with external word alignment and chunk alignment.

Who: Jae Dong Kim
When: Tuesday May 12, 12:00pm
Where: NSH 3305

Abstract: Since both EBMT and SMT are data driven methods, more accurate word alignment improves system performance in EBMT as in SMT. However, EBMT has focused on finding analogous examples while SMT has achieved plausibly accurate word alignment. For this reason, it is natural that one thinks that EBMT can benefit from using SMT word alignment. In this talk, I am going to talk about our approach to make use of more accurate external word alignment from SMT in our EBMT system. I am also going to talk about my preliminary results with chunk alignment for translation in EBMT.

Monday, April 13, 2009

Language Model Adaptation for Difficult to Translate Phrases

Presenter: Behrang Mohit
Title: Language Model Adaptation for Difficult to Translate Phrases
Date: Tuesday 12:30pm, 14 April 2009

Abstract:
We investigate the idea of adapting language models for phrases that
have poor translation quality. We apply a selective adaptation
criterion which uses a classifier to locate the most difficult phrase
of each source language sentence. A special adapted language model is
constructed for the highlighted phrase. Our adaptation heuristic uses
lexical features of the phrase to locate the relevant parts of the
parallel corpus for language model training. As we vary the
experimental setup by changing the size of the SMT training data, our
adaptation method consistently shows strong improvements over the
baseline systems.
This is a joint work with Frank Liberato and Rebecca Hwa.

Thursday, March 12, 2009

Moving Beyond Phrase-Pairs: Dynamically Scoring Collections of Translation Examples

Moving Beyond Phrase-Pairs: Dynamically Scoring Collections of
Translation Examples

Who: Aaron B. Phillips
When: Friday Mar 13, 12:00pm
Where: NSH 1507

Abstract:

Statistical Machine Translation has prospered because it is based on
models that are consistent and straightforward to optimize. The
log-linear model in particular allows the researcher to exploit
numerous, possibly dependent, features. However, the modeling approach
taken by SMT enforces a particular top-down view of the data using
phrase-pairs that does not easily allow for the integration of features
that may change from example to example. What I propose is a shift in
how the model is built. Inspired by Example-Based Machine Translation, I
calculate features for each example separately, but like SMT this
information is collected into a single log-linear model that is
straightforward to optimize. This is accomplished by identifying at
run-time the most appropriate collection of translation examples instead
of using precomputed phrase-pairs. A search is performed over each
example-specific feature such as the alignment quality, genre, or
context to determine a collection that maximizes the score. The weights
for each example-specific feature are adjustable during optimization and
allow for a trade-off between forming collections over all the examples
and forming collections that consist of a few high-quality examples.
This framework seeks to unify the approaches of EBMT and SMT. It results
in a model that is straightforward to optimize *and* allows the
integration of novel example-specific features.

Tuesday, February 3, 2009

An Overview of Tree-to-String Translation Models: Yang Liu

Speaker: Yang Liu
Title: An Overview of Tree-to-String Translation Models

Abstract:

Recent research on statistical machine translation has lead to the rapid development of syntax-based translation models, in which syntactic information can be exploited to direct translation. In this talk, I will give an overview of tree-to-string translation models, one of the state-of-the-art syntax-based models. In a tree-to-string model, the source side is a phrase structure parse tree and the target side is a string. This talk includes the following topics: (1) naive tree-to-string model, (2) tree-sequence based tree-to-string model, (3) context-aware tree-to-string model, and (4) forest-based tree-to-string model. Experimental results show that forest-based tree-to-string model outperforms hierarchical phrase-based model significantly.

Short Bio:

Yang Liu is an Assistant Researcher at Institute of Computing Technology, Chinese Academy of Sciences. He graduated in Computer Science from Wuhan University in 2002. He received his PhD degree in Computer Science from Institute of Computing Technology, Chinese Academy of Sciences. His major research interests include statistical machine translation and Chinese information processing. His publications on discriminative word alignment and tree-to-string models have received wide attention. He served as PC member/Reviewer for TALIP, ACL, EMNLP, AMTA, and SSST.

Tuesday, January 13, 2009

Parallel Treebanks in Machine Translation

Title: Parallel Treebanks in Machine Translation
Speaker: John Tinsley, Ph.D. student at the National Centre for Language Technology in DCU

Monday, December 8, 2008

Fast MT Pipeline: Introduction to tools you can use

Date: 09-Dec-2008

Qin and Alok will report on their work to speed up some
of the MT processing by using parallel processing, with
an emphasis on the tools they have developed, to do this
kind of work.

Title: Fast MT Pipeline: Introduction to tools you can use.

Abstract: In this talk, we would like to introduce you to some
recently developed tools available for you to use, in order to
speed up the MT pipeline. The tools of focus are:
(i) multi-threaded giza: faster word alignment.
(ii) chaksi: Phrase-Extraction on the M45 cluster.
(iii) trambo: Decoding/MERT on the M45 cluster.

Sunday, November 16, 2008

Presentations

Date: 11 Nov 2008
Time: 12-1:30
Room: NSH 3305

Presentations:

Andreas Zollmann: Wider Pipelines: N-Best Alignments and Parses in MT Training

State-of-the-art statistical machine translation systems use hypotheses from several maximum a posteriori inference steps, including word alignments and parse trees, to identify translational structure and estimate the parameters of translation models. While this approach leads to a modular pipeline of independently developed components, errors made in these “single-best” hypotheses can propagate to downstream estimation steps that treat these inputs as clean, trustworthy training data. In this work we integrate N-best alignments and parses by using a probability distribution over these alternatives to generate posterior fractional counts for use in downstream estimation. Using these fractional counts in a DOPinspired syntax-based translation system, we show significant improvements in translation quality over a single-best trained baseline.

Silja Hildebrand: Combination of Machine Translation Systems via Hypothesis Selection from Combined N-Best Lists

Different approaches in machine translation achieve similar translation quality with a variety of translations in the output. Recently it has been shown, that it is possible to leverage the individual strengths of various systems and improve the overall translation quality by combining translation outputs. In this paper we present a method of hypothesis selection which is relatively simple compared to system combination methods which construct a synthesis of the input hypotheses. Our method uses information from n-best lists from several MT systems and features on the sentence level which are independent from the MT systems involved to improve the translation quality.