Topic-XICL: Demonstration Selection with Topic Inference for Cross-lingual In-context Learning

Topic-XICL: Demonstration Selection with Topic Inference for Cross-lingual In-context Learning

ACL ARR 2024 June Submission3983 Authors

16 Jun 2024 (modified: 22 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Cross-lingual in-context learning (XICL) shows promise for adapting large language models (LLMs) to low-resource languages. Previous methods rely on off-the-shelf or task-specific retrievers based on LLM feedback signals for demonstration selection. However, these approaches often neglect factors beyond semantic similarity and can be resource-intensive. To address these challenges, we propose a novel approach called Topic-XICL, which leverages a latent topic model to select demonstrations for XICL. We assume that latent topic variables encapsulate information that more accurately characterizes demonstrations. By training this topic model on rich-resource language data with a small-parameter LLM, we obtain more informative demonstrations through topic inference and utilize them for in-context learning across various LLMs. Our method is tested on three multilingual tasks (XNLI, XCOPA, and TyDiQA-GoldP) and three models with approximately 7 billion parameters, including two multilingual LLMs (BLOOM and XGLM), and an English-centric model, Llama2. Comparative evaluations against baselines of random selection, semantic similarity selection, and clustering-based selection show consistent improvements in multilingual performance with our approach.

Paper Type: Long

Research Area: Multilingualism and Cross-Lingual NLP

Research Area Keywords: multilingualism,cross-lingual transfer

Contribution Types: Approaches to low-resource settings

Languages Studied: Arabic,Bengali,Bulgarian,Chinese,English,Estonian,Finnish,French,German,Greek,Haitian,Indonesian,Italian,Korean,Quechua,Russian,Spanish,Swahili,Tamil,Telugu,Thai,Turkish,Turkish,Vietnamese

Submission Number: 3983

Loading