Enabling Unsupervised Neural Machine Translation with Word-level Visual Representations

Published: 07 Oct 2023, Last Modified: 01 Dec 2023EMNLP 2023 FindingsEveryoneRevisionsBibTeX
Submission Type: Regular Long Paper
Submission Track: Machine Translation
Submission Track 2: Speech and Multimodality
Keywords: Unsupervised Machine Translation, Cross-modal Machine Translation, Word-level Image
TL;DR: Improving unsupervised machine translation with word-level visual representation to address lexical confusion
Abstract: Unsupervised neural machine translation has recently made remarkable strides, achieving impressive results with the exclusive use of monolingual corpora. Nonetheless, these methods still exhibit fundamental flaws, such as confusing similar words. A straightforward remedy to rectify this drawback is to employ bilingual dictionaries, however, high-quality bilingual dictionaries can be costly to obtain. To overcome this limitation, we propose a method that incorporates images at the word level to augment the lexical mappings. Specifically, our method inserts visual representations into the model, modifying the corresponding embedding layer information. Besides, a visible matrix is adopted to isolate the impact of images on other unrelated words. Experiments on the Multi30k dataset with over 300,000 self-collected images validate the effectiveness in generating more accurate word translation, achieving an improvement of up to $+$2.81 BLEU score, which is comparable or even superior to using bilingual dictionaries.
Submission Number: 1773
Loading