TL;DR: We use bilingual dictionaries for data augmentation for neural machine translation
Abstract: We explore ways of incorporating bilingual dictionaries to enable semi-supervised
neural machine translation. Conventional back-translation methods have shown
success in leveraging target side monolingual data. However, since the quality of
back-translation models is tied to the size of the available parallel corpora, this
could adversely impact the synthetically generated sentences in a low resource
setting. We propose a simple data augmentation technique to address both this
shortcoming. We incorporate widely available bilingual dictionaries that yield
word-by-word translations to generate synthetic sentences. This automatically
expands the vocabulary of the model while maintaining high quality content. Our
method shows an appreciable improvement in performance over strong baselines.
Community Implementations: [ 1 code implementation](https://www.catalyzex.com/paper/incorporating-bilingual-dictionaries-for-low/code)
3 Replies
Loading