ELDA: Enhancing Multi-Modal Machine Translation via Large Language Model-Driven Data Augmentation

Published: 01 Jan 2024, Last Modified: 12 Apr 2025MLNLP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Multi-modal machine translation (MMT) leverages auxiliary information to reduce ambiguity and semantic distortion compared to traditional neural machine translation. However, a significant challenge for current MMT methods is their inability to surpass strong pre-trained machine translation methods, mainly due to the scarcity of triplet training data. A common approach involves incorporating extensive parallel and monolingual data to train the text and visual models separately. However, collecting external data is labor-intensive and time-consuming, often leading to distribution deviations. To address this challenge, we introduce ELDA, a novel low-cost data augmentation method driven by large language models, which can automatically expand the dataset. We introduce a carefully crafted prompt based on multi-attributes, combined with in-context learning, to effectively guide the GPT-3.5 model in generating diverse yet consistent-style samples. Extensive experiments on three benchmark datasets show obvious improvement up to 3.26 BLEU score and 2.64 METEOR score on the current leading MMT methods in downstream MMT tasks, confirming the effectiveness of ELDA.
Loading