Keywords: Multilingual and Cross-lingual, In-context Learning, Demonstration Selection for ICL
Abstract: Cross-lingual in-context learning (XICL) shows promise for adapting large language models (LLMs) to low-resource languages. Previous methods typically rely on off-the-shelf similarity-based approaches or task-specific retrievers trained with LLM feedback for demonstration selection. However, these methods often overlook important factors beyond a single criterion or can be resource-intensive. To address these challenges, we propose a novel approach called Topic-XICL, which leverages a latent topic model for demonstration selection. We assume that latent topic variables encapsulate information that more accurately characterizes demonstrations. By training this topic model on rich-resource language data with a compact LLM, we obtain more relevant demonstrations through topic inference and apply them for in-context learning across various LLMs. We evaluated our method on three multilingual tasks (XNLI, XCOPA, and TyDiQA-GoldP) using three models with 7 to 8 billion parameters (BLOOM, Qwen1.5, and Llama3.1). Our approach outperformed the baselines—random selection, semantic similarity, and clustering-based methods—on TyDiQA-GoldP, XCOPA, and XNLI by 3.32\%, 2.47\%, and 1.77\%, respectively, while requiring only moderate additional resources.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 11389
Loading