- Abstract: Neural machine translation, which achieves near human-level performance in some languages, strongly relies on the availability of large amounts of parallel sentences, which hinders its applicability to low-resource language pairs. Recent works explore the possibility of unsupervised machine translation with monolingual data only, leading to much lower accuracy compared with the supervised one. Observing that weakly paired bilingual documents are much easier to collect than bilingual sentences, e.g., from Wikipedia, news websites or books, in this paper, we investigate the training of translation models with weakly paired bilingual documents. Our approach contains two components/steps. First, we provide a simple approach to mine implicitly bilingual sentence pairs from document pairs which can then be used as supervised signals for training. Second, we leverage the topic consistency of two weakly paired documents and learn the sentence-to-sentence translation by constraining the word distribution-level alignments. We evaluate our proposed method on weakly paired documents from Wikipedia on four tasks, the widely used WMT16 German$\leftrightarrow$English and WMT13 Spanish$\leftrightarrow$English tasks, and obtain $24.1$/$30.3$ and $28.0$/$27.6$ BLEU points separately, outperforming state-of-the-art unsupervised results by more than 5 BLEU points and reducing the gap between unsupervised translation and supervised translation up to 50\%.
- Keywords: Natural Language Processing, Machine Translation, Unsupervised Learning