Topic-XICL: Demonstration Selection with Topic Inference for Cross-lingual In-context Learning
---
### Abstract

---
Cross-lingual in-context learning (XICL) shows promise for adapting large language models (LLMs) to low-resource languages. Previous methods typically rely on off-the-shelf similarity-based approaches or task-specific retrievers trained with LLM feedback for demonstration selection. However, these methods often overlook important factors beyond a single criterion or can be resource-intensive. To address these challenges, we propose a novel approach called Topic-XICL, which leverages a latent topic model for demonstration selection. We assume that latent topic variables encapsulate information that more accurately characterizes demonstrations. By training this topic model on rich-resource language data with a compact LLM, we obtain more relevant demonstrations through topic inference and apply them for in-context learning across various LLMs. Our method is tested on three multilingual tasks (XNLI, XCOPA, and TyDiQA-GoldP) and three models, each with approximately 7 to 8 billion parameters (BLOOM, Qwen1.5, and Llama3.1). By abstracting multiple factors into topic variables, our approach consistently outperforms random selection, semantic similarity selection, and clustering-based baselines, while requiring only moderate additional resources.

This resource contains two directories ```src``` and ```data```, the train, inference and test code of Topic-XICL model in ```src```, and all the train and test datasets in ```data```.


### Environment

---
- GPU       NVIDIA GeForce RTX 3090  24G
- python    3.10.14
- torch     2.3.1
- cuda      12.3
- transformers 4.44.2

### Usage

```
bash run.sh
```

