Robust In-Context Learning via Multi-Armed Bandit-Based Partition Selection

Varul Srivastava; Sankarshan Damle; Manisha Padala

Robust In-Context Learning via Multi-Armed Bandit-Based Partition Selection

Varul Srivastava, Sankarshan Damle, Manisha Padala

Published: 06 Mar 2025, Last Modified: 30 Apr 2025ICLR 2025 Workshop Data Problems PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: In-context Learning, Multi-armed Bandits

TL;DR: UCB-based partition selection improves ICL robustness by mitigating label noise through iterative reliability assessment

Abstract: In-context learning (ICL) enables Large Language Models (LLMs) to adapt to new tasks without parameter updates, relying solely on exemplar selection. However, in real-world scenarios, data partitions may contain corrupted labels, degrading ICL performance. We address this challenge by formulating partition selection as a multi-armed bandit (MAB) problem, where each evaluation sample serves as a pull, allowing the model to identify the most reliable partitions iteratively. Using an Upper Confidence Bound (UCB) strategy, we progressively refine exemplar selection to mitigate the impact of noisy data. Empirical results demonstrate that UCB-based partition selection recovers performance comparable to settings without label noise, highlighting its effectiveness in improving ICL robustness.

Submission Number: 93

Loading