INCORPORATING BILINGUAL DICTIONARIES FOR LOW RESOURCE SEMI-SUPERVISED NEURAL MACHINE TRANSLATION

Mihir Kale, Sreyashi Nag, Varun Lakshinarasimhan, Swapnil Singhavi

Mar 25, 2019 ICLR 2019 Workshop LLD Blind Submission readers: everyone
  • TL;DR: We use bilingual dictionaries for data augmentation for neural machine translation
  • Abstract: We explore ways of incorporating bilingual dictionaries to enable semi-supervised neural machine translation. Conventional back-translation methods have shown success in leveraging target side monolingual data. However, since the quality of back-translation models is tied to the size of the available parallel corpora, this could adversely impact the synthetically generated sentences in a low resource setting. We propose a simple data augmentation technique to address both this shortcoming. We incorporate widely available bilingual dictionaries that yield word-by-word translations to generate synthetic sentences. This automatically expands the vocabulary of the model while maintaining high quality content. Our method shows an appreciable improvement in performance over strong baselines.
0 Replies

Loading