Topic-XICL: Demonstration Selection with Topic Inference for Cross-lingual In-context Learning

Linjuan Wu; Weiming Lu

Topic-XICL: Demonstration Selection with Topic Inference for Cross-lingual In-context Learning

Linjuan Wu, Weiming Lu

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multilingual and Cross-lingual, In-context Learning, Demonstration Selection for ICL

Abstract: Cross-lingual in-context learning (XICL) shows promise for adapting large language models (LLMs) to low-resource languages. Previous methods typically rely on off-the-shelf similarity-based approaches or task-specific retrievers trained with LLM feedback for demonstration selection. However, these methods often overlook important factors beyond a single criterion or can be resource-intensive. To address these challenges, we propose a novel approach called Topic-XICL, which leverages a latent topic model for demonstration selection. We assume that latent topic variables encapsulate information that more accurately characterizes demonstrations. By training this topic model on rich-resource language data with a compact LLM, we obtain more relevant demonstrations through topic inference and apply them for in-context learning across various LLMs. We evaluated our method on three multilingual tasks (XNLI, XCOPA, and TyDiQA-GoldP) using three models with 7 to 8 billion parameters (BLOOM, Qwen1.5, and Llama3.1). Our approach outperformed the baselines—random selection, semantic similarity, and clustering-based methods—on TyDiQA-GoldP, XCOPA, and XNLI by 3.32\%, 2.47\%, and 1.77\%, respectively, while requiring only moderate additional resources.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 11389

Loading