Letting Uncertainty Guide Your Multimodal Machine Translation

Published: 07 May 2025, Last Modified: 13 Jun 2025UAI 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multimodal Machine Translation (MMT); Uncertainty Modeling; Cross-modal Fusion; Evidence-based Translation; Neural Machine Translation (NMT)
TL;DR: Proposed UG-MMT framework uses uncertainty modeling to enhance multimodal machine translation, optimizing visual information usage to reduce ambiguity and achieve state-of-the-art results on Multi30K dataset.
Abstract: Multimodal Machine Translation (MMT) leverages additional modalities, such as visual data, to enhance translation accuracy and resolve linguistic ambiguities inherent in text-only approaches. Recent advancements predominantly focus on integrating image information via attention mechanisms or feature fusion techniques. However, current approaches lack explicit mechanisms to quantify and manage the uncertainty during translation process, resulting in the utilization of image information being a black box. This makes it difficult to effectively address the issues of incomplete utilization of visual information and even potential degradation of translation quality when using visual information.To address these challenges, we introduce a novel Uncertainty-Guided Multimodal Machine Translation (UG-MMT) framework that redefines how translation systems handle ambiguity through systematic uncertainty reduction. Designed with plug-and-play flexibility, our framework enables seamless integration into existing MMT systems, requiring minimal modification while delivering significant performance gains.
Supplementary Material: zip
Latex Source Code: zip
Signed PMLR Licence Agreement: pdf
Readers: auai.org/UAI/2025/Conference, auai.org/UAI/2025/Conference/Area_Chairs, auai.org/UAI/2025/Conference/Reviewers, auai.org/UAI/2025/Conference/Submission345/Authors, auai.org/UAI/2025/Conference/Submission345/Reproducibility_Reviewers
Submission Number: 345
Loading