Comparing the Sentence Alignment Yield from Two News Corpora Using a Dictionary-Based Alignment SystemOpen Website

2003 (modified: 16 Jul 2019)ParallelTexts@NAACL-HLT 2003Readers: Everyone
Abstract: Corpus-based MT requires the input of large sentence aligned bilingual corpora, but these are hard to find for Japanese. Bilingual news corpora seem to offer a useful resource for Machine Translation, but their quality is variable. Sentence alignments produced by filtering literal word translations from the NHK corpus yield disappointing results, though correlating NP translations performs better. Using this method gets even better results from the Nikkei corpus. This paper reports sentence alignment results from 2 corpora, in a 2-pass dictionary based alignment system.
0 Replies

Loading