VisDNMT: Improving Neural Machine Translation via Visual Knowledge Distillation

Anonymous

VisDNMT: Improving Neural Machine Translation via Visual Knowledge Distillation

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone

TL;DR: VisDNMT distills visual knowledge from a pre-trained multilingual visual-language model to help text-only translation without using paired images

Abstract: Multi-modal machine translation (MMT) is the research field that aims to improve neural machine translation (NMT) models with visual knowledge. While existing MMT systems achieve promising performance over text-only NMT methods, they typically require paired text and image as input, which limits their applicability to general translation tasks. To benefit general translation with visual knowledge, we propose VisDNMT, which distills visual knowledge from a pre-trained multilingual visual-language model to help translation. In particular, we train a transformer-based model jointly with a standard cross-entropy loss for translation and a knowledge distillation (KD) objective that aligns its language embedding with vision contextualized language embedding of the teacher model. VisDNMT achieves consistently higher gains over text-only NMT baselines, compared to state-of-art methods on rich and sparse visually grounded text.

Paper Type: short

Research Area: Machine Translation

Languages Studied: English,German,French

0 Replies

Loading