Keywords: Large Language Models, In-context learning, Continual learning in LLMs, Post-training adaptation, Retrieval-augmented generation, Parameter-efficient fine-tuning
Abstract: Large language models (LLMs) cannot accumulate experience across interactions without parameter updates. Retrieval-augmented generation and memory-based approaches attempt to leverage past interactions but typically rely on semantic similarity alone and ignore whether experiences actually improve performance. We introduce an uncertainty-aware guidance framework that distills compact guidance from past failures and selects it via a contextual bandit formulation. Each guidance item maintains a Beta posterior over effectiveness, and Thompson sampling balances exploration and exploitation, allowing the model to downweight unhelpful guidance over time. Across benchmarks, our method corrects up to 69.5% of prior errors and improves Haiku 4.5 accuracy by up to 26%. Notably, guidance distilled from a weaker open-weight model (Qwen3 4B) transfers effectively to a stronger proprietary model (Haiku 4.5), demonstrating experience exchange across models in the context space.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 53
Loading