TL;DR: We use bilingual dictionaries for data augmentation for neural machine translation
Abstract: We explore ways of incorporating bilingual dictionaries to enable semi-supervised
neural machine translation. Conventional back-translation methods have shown
success in leveraging target side monolingual data. However, since the quality of
back-translation models is tied to the size of the available parallel corpora, this
could adversely impact the synthetically generated sentences in a low resource
setting. We propose a simple data augmentation technique to address both this
shortcoming. We incorporate widely available bilingual dictionaries that yield
word-by-word translations to generate synthetic sentences. This automatically
expands the vocabulary of the model while maintaining high quality content. Our
method shows an appreciable improvement in performance over strong baselines.
3 Replies
Loading