Learning Cross-Lingual Sentence Representations via a Multi-task Dual-Encoder Model

Muthuraman Chidambaram; Yinfei Yang; Daniel Cer; Steve Yuan; Yun-Hsuan Sung; Brian Strope; Ray Kurzweil

Learning Cross-Lingual Sentence Representations via a Multi-task Dual-Encoder Model

Muthuraman Chidambaram, Yinfei Yang, Daniel Cer, Steve Yuan, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil

27 Sept 2018 (modified: 22 Jun 2025)ICLR 2019 Conference Blind SubmissionReaders: Everyone

Abstract: A significant roadblock in multilingual neural language modeling is the lack of labeled non-English data. One potential method for overcoming this issue is learning cross-lingual text representations that can be used to transfer the performance from training on English tasks to non-English tasks, despite little to no task-specific non-English data. In this paper, we explore a natural setup for learning crosslingual sentence representations: the dual-encoder. We provide a comprehensive evaluation of our cross-lingual representations on a number of monolingual, crosslingual, and zero-shot/few-shot learning tasks, and also give an analysis of different learned cross-lingual embedding spaces.

Keywords: sentence, embeddings, zero-shot, multilingual, multi-task, cross-lingual

TL;DR: State-of-the-art zero-shot learning performance by using a translation task to bridge multi-task training across languages.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/learning-cross-lingual-sentence/code)

9 Replies

Loading