Extracting Parallel Sub-Sentential Fragments from Non-Parallel CorporaDownload PDFOpen Website

2006 (modified: 12 Nov 2022)ACL 2006Readers: Everyone
Abstract: We present a novel method for extracting parallel sub-sentential fragments from comparable, non-parallel bilingual corpora. By analyzing potentially similar sentence pairs using a signal processing-inspired approach, we detect which segments of the source sentence are translated into segments in the target sentence, and which are not. This method enables us to extract useful machine translation training data even from very non-parallel corpora, which contain no parallel sentence pairs. We evaluate the quality of the extracted data by showing that it improves the performance of a state-of-the-art statistical machine translation system.
0 Replies

Loading