Tuesday, November 13, 2007

Trees that can help

Speaker: Alok Parlikar

Title: (S (NP (NP Trees) (SBAR (WHNP that) (S (VP can)))) (VP help))


For the past two months, I have been working with Alon Lavie and Stephan
Vogel, on Chinese and English parse-trees, to investigate answers to the
following questions:

(a) Can constituency information and word level alignments be used to
align nodes in trees of parallel sentences? How precisely matched
(in meaning) are the yields of these aligned nodes?
(b) Can the parse trees and word-level alignments be used for learning
reordering rules? If we use these rules to reorder source sentences,
can we do any better at translation?

The current results show that:

(a) - Node Alignments from hand-aligned data are very precise.
- Using automatic word-alignments to align nodes gives over 70%
precision and over 40% recall.
(b) Using a 10-best reordering of words in the source sentences, with
a "dumb" reordering strategy has shown a 0.005 improvement in BLEU

I would like to talk about the approaches that we have taken here, and to
discuss about strategies for improving these results.

Tuesday, August 14, 2007

Experiments with a Noun-Phrase driven Statistical Machine Translation System

Title: Experiments with a Noun-Phrase driven Statistical Machine
Translation System

Speaker: Sanjika Hewavitharana
Date: Aug 14, 2007
Time: 12:00pm
Place: NSH 3305 [During MT Group Monthly Lunch]


Hierarchical translation models that use phrases with words as well as
sub-phrases have shown better performance than standard phrase based
systems. In this talk I will present a noun-phrase (NP) driven
statistical machine translation system. Using noun-phrases as the
decomposition unit, we build a two-level hierarchy of phrases. We first
identify noun-phrases in the data and replace them with a tag to produce
an NP tagged corpus. This corpus is then used to extract NP-tagged
phrase translation pairs. Both noun-phrases and NP-tagged phrases are
used in a two-level translation decoder. The two-level system shows
significant improvements over a baseline phrase-based SMT system. It
also produces longer matching phrases due to the generalization
introduced by tagging noun-phrases.