Test-Time Adaptation via Many-Shot Prompting: Benefits, Limits, and Pitfalls

Shubhangi Upasani; Chen Wu; Jay Rainton; Changran Hu; Qizheng Zhang; Bo Li; Urmish Thakker

Test-Time Adaptation via Many-Shot Prompting: Benefits, Limits, and Pitfalls

Shubhangi Upasani, Chen Wu, Jay Rainton, Changran Hu, Qizheng Zhang, Bo Li, Urmish Thakker

Published: 05 Mar 2026, Last Modified: 12 Mar 2026ICLR 2026 Workshop RSI ShortPaperEveryoneRevisionsCC BY 4.0

Keywords: Test-time adaptation, test-time updates, many-shot prompting, Dynamic and Reinforced In-context learning

Abstract: Test-time adaptation enables large language models (LLMs) to modify their behavior at inference without updating model parameters. A common approach is many-shot prompting, where large numbers of in-context learning (ICL) examples are injected as an input-space test-time update. Although performance can improve as more demonstrations are added, the reliability and limits of this update mechanism remain poorly understood, particularly for open-source models. We present an empirical study of many-shot prompting across tasks and model backbones, analyzing how performance varies with update magnitude, example ordering, and selection policy. We further study Dynamic and Reinforced ICL as alternative test-time update strategies that control which information is injected and how it constrains model behavior. We find that many-shot prompting is effective for structured tasks where demonstrations provide high information gain, but is highly sensitive to selection strategy and often shows limited benefits for open-ended generation tasks. Overall, we characterize the practical limits of prompt-based test-time adaptation and outline when input-space updates are beneficial versus harmful.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 50

Loading