Speaker: Alok Parlikar
Title: (S (NP (NP Trees) (SBAR (WHNP that) (S (VP can)))) (VP help))
Summary:
For the past two months, I have been working with Alon Lavie and Stephan
Vogel, on Chinese and English parse-trees, to investigate answers to the
following questions:
(a) Can constituency information and word level alignments be used to
align nodes in trees of parallel sentences? How precisely matched
(in meaning) are the yields of these aligned nodes?
(b) Can the parse trees and word-level alignments be used for learning
reordering rules? If we use these rules to reorder source sentences,
can we do any better at translation?
The current results show that:
(a) - Node Alignments from hand-aligned data are very precise.
- Using automatic word-alignments to align nodes gives over 70%
precision and over 40% recall.
(b) Using a 10-best reordering of words in the source sentences, with
a "dumb" reordering strategy has shown a 0.005 improvement in BLEU
score.
I would like to talk about the approaches that we have taken here, and to
discuss about strategies for improving these results.
Tuesday, November 13, 2007
Tuesday, October 9, 2007
Sub-Phrasal Matching and Structural Templates in Example-Based MT
Speaker: Aaron Phillips
Title: Sub-Phrasal Matching and Structural Templates in Example-Based MT
Example-Based Machine Translation (EBMT) encompasses many different
approaches to data-driven MT. In this work I first look at two different
paradigms of EBMT. I then combine the strengths of these two systems and
build a new engine that combines sub-phrasal matching with structural
templates. The end result is a melding of ideas from EBMT, SMT, and
Xfer. This synthesis results in higher translation quality and more
graceful degradation, yielding 1.5% to 7.5% relative improvement in BLEU
scores.
This work was recently presented at TMI. The full paper can be found
here:
http://dustoftheground.net/techne/research/sub-phrasal_matching_and_structural_templates_in_example-based_mt.pdf
Title: Sub-Phrasal Matching and Structural Templates in Example-Based MT
Example-Based Machine Translation (EBMT) encompasses many different
approaches to data-driven MT. In this work I first look at two different
paradigms of EBMT. I then combine the strengths of these two systems and
build a new engine that combines sub-phrasal matching with structural
templates. The end result is a melding of ideas from EBMT, SMT, and
Xfer. This synthesis results in higher translation quality and more
graceful degradation, yielding 1.5% to 7.5% relative improvement in BLEU
scores.
This work was recently presented at TMI. The full paper can be found
here:
http://dustoftheground.net/
Tuesday, August 14, 2007
Experiments with a Noun-Phrase driven Statistical Machine Translation System
Title: Experiments with a Noun-Phrase driven Statistical Machine
Translation System
Speaker: Sanjika Hewavitharana
Date: Aug 14, 2007
Time: 12:00pm
Place: NSH 3305 [During MT Group Monthly Lunch]
Abstract:
Hierarchical translation models that use phrases with words as well as
sub-phrases have shown better performance than standard phrase based
systems. In this talk I will present a noun-phrase (NP) driven
statistical machine translation system. Using noun-phrases as the
decomposition unit, we build a two-level hierarchy of phrases. We first
identify noun-phrases in the data and replace them with a tag to produce
an NP tagged corpus. This corpus is then used to extract NP-tagged
phrase translation pairs. Both noun-phrases and NP-tagged phrases are
used in a two-level translation decoder. The two-level system shows
significant improvements over a baseline phrase-based SMT system. It
also produces longer matching phrases due to the generalization
introduced by tagging noun-phrases.
Translation System
Speaker: Sanjika Hewavitharana
Date: Aug 14, 2007
Time: 12:00pm
Place: NSH 3305 [During MT Group Monthly Lunch]
Abstract:
Hierarchical translation models that use phrases with words as well as
sub-phrases have shown better performance than standard phrase based
systems. In this talk I will present a noun-phrase (NP) driven
statistical machine translation system. Using noun-phrases as the
decomposition unit, we build a two-level hierarchy of phrases. We first
identify noun-phrases in the data and replace them with a tag to produce
an NP tagged corpus. This corpus is then used to extract NP-tagged
phrase translation pairs. Both noun-phrases and NP-tagged phrases are
used in a two-level translation decoder. The two-level system shows
significant improvements over a baseline phrase-based SMT system. It
also produces longer matching phrases due to the generalization
introduced by tagging noun-phrases.
Tuesday, May 22, 2007
In Search of Better MT Evaluation Metric : Some Experiments
Date: May 22, 2007
Presenter: Abhaya Agarwal
Title: In Search of Better MT Evaluation Metric : Some Experiments
Abstract: Area of automatic metrics for MT Evaluation has seen a lot of
activity in last 4-5 years. Starting with BLEU, many such metrics have
been proposed over the years including METEOR, HTER and ROUGE. While
these metrics achieve good correlation with human judgments at the
system level, situation remains bleak at individual sentence level. In
this talk, I will talk about some work that we have been doing towards
developing metrics with improved correlation with human judgments at the
sentence level. I will present the current results and some possible
future directions.
Presenter: Abhaya Agarwal
Title: In Search of Better MT Evaluation Metric : Some Experiments
Abstract: Area of automatic metrics for MT Evaluation has seen a lot of
activity in last 4-5 years. Starting with BLEU, many such metrics have
been proposed over the years including METEOR, HTER and ROUGE. While
these metrics achieve good correlation with human judgments at the
system level, situation remains bleak at individual sentence level. In
this talk, I will talk about some work that we have been doing towards
developing metrics with improved correlation with human judgments at the
sentence level. I will present the current results and some possible
future directions.
Tuesday, April 17, 2007
An Assessment of Language Elicitation without the Supervision of a Linguist
Date: Apr 17, 2007
Presenter: Lori Levin and Alison Alvarez
Title: An Assessment of Language Elicitation without the Supervision of a Linguist
We created an elicitation corpus designed to elicit the morphosyntactic
features of a target language without the supervision of a linguist.
The corpus is composed of approximately 3200 English source sentences
that are then translated by a native speaker into the target language.
The design of our corpus was driven by our need to elicit morphosyntactic
language features without the supervision of a linguist. In a previous
paper we reported on a reverse Treebank and that was a deep
morphosyntactic tree with two parallel human language sentences. The
first is provided by reverse annotation and the second is acquired
through elicitation. This presentation will focus on the extent to which
we able to acquire our morphosyntactic information from our translated
corpora and the types of errors we encountered, both from the perspective
of the translator and the corpus itself.
Presenter: Lori Levin and Alison Alvarez
Title: An Assessment of Language Elicitation without the Supervision of a Linguist
We created an elicitation corpus designed to elicit the morphosyntactic
features of a target language without the supervision of a linguist.
The corpus is composed of approximately 3200 English source sentences
that are then translated by a native speaker into the target language.
The design of our corpus was driven by our need to elicit morphosyntactic
language features without the supervision of a linguist. In a previous
paper we reported on a reverse Treebank and that was a deep
morphosyntactic tree with two parallel human language sentences. The
first is provided by reverse annotation and the second is acquired
through elicitation. This presentation will focus on the extent to which
we able to acquire our morphosyntactic information from our translated
corpora and the types of errors we encountered, both from the perspective
of the translator and the corpus itself.
Tuesday, February 20, 2007
Subscribe to:
Posts (Atom)