XLDA: Cross-Lingual Data Augmentation for Natural Language Inference and Question Answering

Jasdeep Singh; Bryan McCann; Nitish Shirish Keskar; Caiming Xiong; Richard Socher

XLDA: Cross-Lingual Data Augmentation for Natural Language Inference and Question Answering

Jasdeep Singh, Bryan McCann, Nitish Shirish Keskar, Caiming Xiong, Richard Socher

25 Sept 2019 (modified: 03 Apr 2024)ICLR 2020 Conference Blind SubmissionReaders: Everyone

Keywords: cross-lingual, transfer learning, BERT

TL;DR: Translating portions of the input during training can improve cross-lingual performance.

Abstract: While natural language processing systems often focus on a single language, multilingual transfer learning has the potential to improve performance, especially for low-resource languages. We introduce XLDA, cross-lingual data augmentation, a method that replaces a segment of the input text with its translation in another language. XLDA enhances performance of all 14 tested languages of the cross-lingual natural language inference (XNLI) benchmark. With improvements of up to 4.8, training with XLDA achieves state-of-the-art performance for Greek, Turkish, and Urdu. XLDA is in contrast to, and performs markedly better than, a more naive approach that aggregates examples in various languages in a way that each example is solely in one language. On the SQuAD question answering task, we see that XLDA provides a 1.0 performance increase on the English evaluation set. Comprehensive experiments suggest that most languages are effective as cross-lingual augmentors, that XLDA is robust to a wide range of translation quality, and that XLDA is even more effective for randomly initialized models than for pretrained models.

Data: [MultiNLI](https://paperswithcode.com/dataset/multinli), [SQuAD](https://paperswithcode.com/dataset/squad)

Original Pdf: pdf

4 Replies

Loading