CCIM: Cross-modal Cross-lingual Interactive Image Translation

Cong MA; Yaping Zhang; Mei Tu; Yang Zhao; Yu Zhou; Chengqing Zong

CCIM: Cross-modal Cross-lingual Interactive Image Translation

Cong MA, Yaping Zhang, Mei Tu, Yang Zhao, Yu Zhou, Chengqing Zong

Published: 07 Oct 2023, Last Modified: 01 Dec 2023EMNLP 2023 FindingsEveryoneRevisionsBibTeX

Submission Type: Regular Short Paper

Submission Track: Speech and Multimodality

Submission Track 2: Machine Translation

Keywords: cross-modal cross-lingual interactive decoding, text image machine translation, text image recogntion

TL;DR: A novel Cross-modal Cross-lingual Interactive text image translation model is proposed to incorporate source language information by synchronously generating source and target language results through an interactive attention mechanism.

Abstract: Text image machine translation (TIMT) which translates source language text images into target language texts has attracted intensive attention in recent years. Although the end-to-end TIMT model directly generates target translation from encoded text image features with an efficient architecture, it lacks the recognized source language information resulting in a decrease in translation performance. In this paper, we propose a novel Cross-modal Cross-lingual Interactive Model (CCIM) to incorporate source language information by synchronously generating source language and target language results through an interactive attention mechanism between two language decoders. Extensive experimental results have shown the interactive decoder significantly outperforms end-to-end TIMT models and has faster decoding speed with smaller model size than cascade models.

Submission Number: 945

Loading