Belief Dynamics Unify In-Context Learning and Activation Steering

Eric Bigelow; Daniel Wurgaft; YingQiao Wang; Noah Goodman; Tomer Ullman; Hidenori Tanaka; Ekdeep Singh Lubana

Belief Dynamics Unify In-Context Learning and Activation Steering

Eric Bigelow, Daniel Wurgaft, YingQiao Wang, Noah Goodman, Tomer Ullman, Hidenori Tanaka, Ekdeep Singh Lubana

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Bayesian Models, In-Context Learning, Interpretability

Abstract: Large language models (LLMs) can be controlled through prompts (in-context learning) and internal activations (activation steering), but a unified theory explaining these methods is lacking, making their application often reliant on trial-and-error. Here, we develop a unifying predictive account of LLM control from a Bayesian perspective, proposing that both context- and activation-based interventions impact behavior by shifting the \textit{model's belief in latent concepts}. Under our framework, steering operates by shifting concept priors, and in-context learning leads to an accumulation of evidence. This theory predicts three key phenomena we verify empirically: (i) sigmoidal learning curves as in-context evidence accumulates, (ii) predictable shifts of these curves with activation steering, and (iii) additive effects of both interventions, creating distinct behavioral phases. Our framework yields a closed-form model that is highly predictive of LLM behavior across context- and activation-based interventions in a set of five domains inspired by prior work on many-shot in-context learning. Crucially, this model also predicts the precise crossover boundaries where these interventions trigger sudden behavioral shifts. Taken together, our framework offers a unified account of prompt-based and activation-based control of LLM behavior, and a methodology for empirically predicting the effects of these interventions.

Primary Area: interpretability and explainable AI

Submission Number: 23195

Loading