Enhancing Cross-Lingual Training with Knowledge Learned from Multi-Lingual TrainingDownload PDF

Anonymous

08 Mar 2022 (modified: 05 May 2023)NAACL 2022 Conference Blind SubmissionReaders: Everyone
Paper Link: https://openreview.net/forum?id=lYGuoGhCY1a
Paper Type: Long paper (up to eight pages of content + unlimited references and appendices)
Abstract: On multi-lingual natural language processing (NLP) tasks, it is generally agreed that multi-lingual models perform better than cross-lingual models even with limited training data in the target languages. Though this is expected, its cause has not been well-studied. In this paper, we examine the differences between cross- and multi-lingual models fine-tuned on syntactic, semantic, or sentiment analysis (SA) tasks, from the perspectives of parameter updates, feature extraction, and domain changes to investigate the advantage of multi-lingual training. Additionally, we incorporate the knowledge we learn from our analyses into the training process of cross-lingual models to improve their performance. Results show that jointly applying feature augmentation and domain adaptation approaches effectively improves the performance of the vanilla cross-lingual models, with average F1-macro score improvements from 0.38% to 20.75% on four NLP tasks. Our studies indicate cross-lingual training effectiveness could be enhanced without requiring additional labeled data in the target languages. This provides an alternative choice to data augmentation for future research on resource-scarce languages.
0 Replies

Loading