Effective Use of Linguistic and Contextual Information for Statistical Machine Translation

Libin Shen, Jinxi Xu, Bing Zhang, Spyros Matsoukas, Ralph M. Weischedel

2009 (modified: 10 Nov 2022)EMNLP 2009Readers: Everyone

Abstract: Current methods of using lexical features in machine translation have difficulty in scaling up to realistic MT tasks due to a prohibitively large number of parameters involved. In this paper, we propose methods of using new linguistic and contextual features that do not suffer from this problem and apply them in a state-of-the-art hierarchical MT system. The features used in this work are non-terminal labels, non-terminal length distribution, source string context and source dependency LM scores. The effectiveness of our techniques is demonstrated by significant improvements over a strong base-line. On Arabic-to-English translation, improvements in lower-cased BLEU are 2.0 on NIST MT06 and 1.7 on MT08 newswire data on decoding output. On Chinese-to-English translation, the improvements are 1.0 on MT06 and 0.8 on MT08 newswire data.

0 Replies