Abstract: Tree-based machine translation models possess the property of long-distance reordering by incorporating the syntactic annotations of parse trees from both or either side(s) of the bitext. However, with the increasing of sentence length, the parsing accuracy usually goes down, which will further drop the performance of tree-based machine translation. To alleviate it, we choose to translate clauses other than entire sentences, while the challenge is to split the source sentences appropriately. In this paper, we propose a novel approach to induce clause parser from word-aligned parallel corpora and test its effectiveness on tree-to-string machine translation. Experiments on multi translation tasks show that our approach outperforms previous rule-based approaches which mainly depend on punctuations and predefined rules. More importantly, our approach works much better than the rule-based method on text without punctuations.
0 Replies
Loading