On Learning Universal Representations Across Languages

Xiangpeng Wei; Rongxiang Weng; Yue Hu; Luxi Xing; Heng Yu; Weihua Luo

On Learning Universal Representations Across Languages

Xiangpeng Wei, Rongxiang Weng, Yue Hu, Luxi Xing, Heng Yu, Weihua Luo

Published: 12 Jan 2021, Last Modified: 22 Jun 2025ICLR 2021 PosterReaders: Everyone

Keywords: universal representation learning, cross-lingual pretraining, hierarchical contrastive learning

Abstract: Recent studies have demonstrated the overwhelming advantage of cross-lingual pre-trained models (PTMs), such as multilingual BERT and XLM, on cross-lingual NLP tasks. However, existing approaches essentially capture the co-occurrence among tokens through involving the masked language model (MLM) objective with token-level cross entropy. In this work, we extend these approaches to learn sentence-level representations and show the effectiveness on cross-lingual understanding and generation. Specifically, we propose a Hierarchical Contrastive Learning (HiCTL) method to (1) learn universal representations for parallel sentences distributed in one or multiple languages and (2) distinguish the semantically-related words from a shared cross-lingual vocabulary for each sentence. We conduct evaluations on two challenging cross-lingual tasks, XTREME and machine translation. Experimental results show that the HiCTL outperforms the state-of-the-art XLM-R by an absolute gain of 4.2% accuracy on the XTREME benchmark as well as achieves substantial improvements on both of the high resource and low-resource English$\rightarrow$X translation tasks over strong baselines.

One-sentence Summary: In this work, we extend pre-trained language models to learn universal representations among multiple languages, and show the effectiveness on cross-lingual understanding and generation.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Data: [GLUE](https://paperswithcode.com/dataset/glue), [MRPC](https://paperswithcode.com/dataset/mrpc), [MultiNLI](https://paperswithcode.com/dataset/multinli), [QNLI](https://paperswithcode.com/dataset/qnli), [SST](https://paperswithcode.com/dataset/sst), [SST-2](https://paperswithcode.com/dataset/sst-2), [XNLI](https://paperswithcode.com/dataset/xnli)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/on-learning-universal-representations-across/code)

13 Replies

Loading