Automatic Parallel Fragment Extraction from Noisy DataDownload PDFOpen Website

2012 (modified: 12 Nov 2022)HLT-NAACL 2012Readers: Everyone
Abstract: We present a novel method to detect parallel fragments within noisy parallel corpora. Isolating these parallel fragments from the noisy data in which they are contained frees us from noisy alignments and stray links that can severely constrain translation-rule extraction. We do this with existing machinery, making use of an existing word alignment model for this task. We evaluate the quality and utility of the extracted data on large-scale Chinese-English and Arabic-English translation tasks and show significant improvements over a state-of-the-art baseline.
0 Replies

Loading