Globetrotter: Unsupervised Multilingual Translation from Visual Alignment

Didac Suris Coll-Vinent; Dave Epstein; Carl Vondrick

Globetrotter: Unsupervised Multilingual Translation from Visual Alignment

Didac Suris Coll-Vinent, Dave Epstein, Carl Vondrick

28 Sept 2020 (modified: 22 Jun 2025)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: cross-modal, multilingual, unsupervised translation, visual similarity

Abstract: Machine translation in a multi-language scenario requires large-scale parallel corpora for every language pair. Unsupervised translation is challenging because there is no explicit connection between languages, and the existing methods have to rely on topological properties of the language representations. We introduce a framework that leverages visual similarity to align multiple languages, using images as the bridge between them. We estimate the cross-modal alignment between language and images, and use this estimate to guide the learning of cross-lingual representations. Our language representations are trained jointly in one model with a single stage. Experiments with fifty-two languages show that our method outperforms prior work on unsupervised word-level and sentence-level translation using retrieval.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

One-sentence Summary: We propose a method that leverages cross- modal alignment between language and vision to train a multilingual translation system without any parallel corpora.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/globetrotter-unsupervised-multilingual/code)

Reviewed Version (pdf): https://openreview.net/references/pdf?id=J8BUMatCxQ

14 Replies

Loading