Abstract: We present a new variant of the SyntaxAugmented Machine Translation (SAMT) formalism with a category-coarsening algorithm originally developed for tree-to-tree grammars. We induce bilingual labels into the SAMT grammar, use them for category coarsening, then project back to monolingual labeling as in standard SAMT. The result is a “collapsed” grammar with the same expressive power and format as the original, but many fewer nonterminal labels. We show that the smaller label set provides improved translation scores by 1.14 BLEU on two Chinese‐ English test sets while reducing the occurrence of sparsity and ambiguity problems common to large label sets.
0 Replies
Loading