Steering LLMs' Reasoning With Activation State Machines

Steering LLMs' Reasoning With Activation State Machines

ICLR 2026 Conference Submission14368 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Activation Steering, Inference-Time Intervention, Controllable Generation

TL;DR: We propose Activation State Machine for context-dependent steering of LLMs toward better mathematical and physical reasoning while staying on the Pareto front of the accuracy-fluency tradeoff.

Abstract: Fine-tuning Large Language Models (LLMs) for specialized skills often leads to catastrophic forgetting, degrading their general capabilities. Activation steering offers a promising alternative, but existing methods are typically stateless, applying a constant intervention that fails to capture the dynamic, history-dependent nature of a reasoning process. We introduce the Activation State Machine (ASM), a lightweight, dynamic steering mechanism inspired by state-space models from control theory. The ASM learns the latent dynamics of an ideal reasoning trajectory from a set of examples and, at inference time, applies real-time corrective interventions to the LLM's hidden states. We demonstrate that ASM steering improves zero-shot accuracy across multiple domains, enhancing performance on both mathematical reasoning and physical reasoning. In addition, we demonstrate that while supervised fine-tuning results in a significant performance drop on an unrelated creative writing task, our method preserves over 95% of the base model's fluency, measured in perplexity. Our work presents a new paradigm for modular skill injection, enabling the enhancement of specialized capabilities in LLMs without compromising their foundational generality.

Primary Area: generative models

Submission Number: 14368

Loading