Keywords: Activation Steering, Inference-Time Intervention, Controllable Generation
TL;DR: We propose Activation State Machine for context-dependent steering of LLMs toward better mathematical and physical reasoning while staying on the Pareto front of the accuracy-fluency tradeoff.
Abstract: Fine-tuning Large Language Models (LLMs) for specialized skills often leads to catastrophic forgetting, degrading their general capabilities. Activation steering offers a promising alternative, but existing methods are typically stateless, applying a constant intervention that fails to capture the dynamic, history-dependent nature of a reasoning process. We introduce the Activation State Machine (ASM), a lightweight, dynamic steering mechanism inspired by state-space models from control theory. The ASM learns the latent dynamics of an ideal reasoning trajectory from a set of examples and, at inference time, applies real-time corrective interventions to the LLM's hidden states. We demonstrate that ASM steering improves zero-shot accuracy across multiple domains, enhancing performance on both mathematical reasoning and physical reasoning. In addition, we demonstrate that while supervised fine-tuning results in a significant performance drop on an unrelated creative writing task, our method preserves over 95% of the base model's fluency, measured in perplexity. Our work presents a new paradigm for modular skill injection, enabling the enhancement of specialized capabilities in LLMs without compromising their foundational generality.
Primary Area: generative models
Submission Number: 14368
Loading