SSP: Self-Supervised Prompting for Cross-Lingual Transfer to Low-Resource Languages using Large Language Models

SSP: Self-Supervised Prompting for Cross-Lingual Transfer to Low-Resource Languages using Large Language Models

ACL ARR 2024 April Submission659 Authors

16 Apr 2024 (modified: 06 May 2024)ACL ARR 2024 April SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Recently, very large language models (LLMs) have shown exceptional performance on several English language NLP tasks with just in-context learning (ICL), but their utility in other languages is still underexplored. We investigate their effectiveness for NLP tasks in low-resource languages (LRLs), especially in the setting of zero-shot cross-lingual transfer (0-CLT), where task-specific training data for one or more related medium-resource languages (MRLs) is available. We introduce Self-Supervised Prompting (SSP), a novel ICL approach for the 0-CLT setting. SSP is based on the key observation that LLMs output more accurate labels if in-context exemplars are from the target language (even if their labels are slightly noisy). To operationalize this, since target language training data is not available in 0-CLT, SSP operates in two stages. In Stage I, using source MRL training data, target language's test data is noisily labeled. In Stage II, these noisy test data points are used as exemplars in ICL for further improved labeling. Additionally, our implementation of SSP uses a novel Integer Linear Programming (ILP)-based exemplar selection that balances similarity, prediction confidence (when available) and label coverage. Experiments on three tasks and twelve LRLs (from three regions) demonstrate that SSP strongly outperforms fine-tuned and other prompting-based baselines.

Paper Type: Long

Research Area: Multilingualism and Cross-Lingual NLP

Research Area Keywords: Low resource languages, Cross-lingual Transfer, LLMs

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches to low-resource settings

Languages Studied: African, Germanic, Indigenous languages of Americas

Section 2 Permission To Publish Peer Reviewers Content Agreement: Authors grant permission for ACL to publish peer reviewers' content

Submission Number: 659

Loading