Cross-Lingual Representation Alignment Through Contrastive Image-Caption Tuning

ACL ARR 2025 February Submission5766 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Multilingual alignment of sentence representations has mostly required bitexts to bridge the gap between languages. We investigate whether visual information can bridge this gap instead. Image caption datasets are very easy to create without requiring multilingual expertise, so this offers a more efficient alternative for low-resource languages. We find that multilingual image-caption alignment can implicitly align the text representations between languages, languages unseen by the encoder in pretraining can be incorporated into this alignment post-hoc, and these aligned representations are usable for cross-lingual Natural Language Understanding (NLU) and bitext retrieval.
Paper Type: Short
Research Area: Multilingualism and Cross-Lingual NLP
Research Area Keywords: multilingualism, cross-lingual transfer, multilingual representations, less-resourced languages, indigenous languages, resources for less-resourced languages
Contribution Types: Approaches to low-resource settings
Languages Studied: English, Spanish, Hindi, Japanese, Quechua
Submission Number: 5766
Loading