Title: Simulating Multiple Translations and ASR Transcripts for Applications in  Multilingual Spoken Document Classification
Speaker: Wei-Hao Lin from the Informedia group
Abstract:
  We propose a statistical model to simulate multiple documents and
  their translations (e.g. Chinese documents and their English
  translations), and apply the model in the task of classifying
  multilingual documents.  The model, based on a frequency matching
  principle, predicts that previous approaches to building classifiers
  from a common language (e.g., English) are not optimal for
  multilingual collections with unbalanced numbers of documents, and a
  proposed multilingual representation can outperform the mono-lingual
  bag-of-words representation.  We also investigate the possibility of
  combining multiple ASR transcripts and translations through
  re-weighting.  The validity of our model is strongly supported by
  the close match between predictions of the simulation model and the
  empirical results of classifying multilingual spoken documents from
  broadcast news in three languages.
Subscribe to:
Post Comments (Atom)
 
No comments:
Post a Comment