From Steering Vectors to Conceptors: Compositional Affine Activation Steering for LLMs

Steven Abreu; Joris Postmus; Alexander Müller; Jeremias Lino Ferrao; Ilija Lichkovski; Kurt Felix Michalak; Guillaume Pourcel; Alice S. Dauphin

From Steering Vectors to Conceptors: Compositional Affine Activation Steering for LLMs

Steven Abreu, Joris Postmus, Alexander Müller, Jeremias Lino Ferrao, Ilija Lichkovski, Kurt Felix Michalak, Guillaume Pourcel, Alice S. Dauphin

12 May 2025 (modified: 29 Oct 2025)Submitted to NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: representation engineering, mechanistic interventions, model steering, large language models, activation addition, function vectors

TL;DR: We derive an optimal method for affine steering of representations in LLMs that improves over prevalent additive methods, and we provide empirical results that demonstrate improved control and flexibility across multiple tasks.

Abstract: Controlling and understanding the internal representations of large language models (LLMs) remain central challenges. We combine conceptor theory with activation steering to develop a principled framework for provably optimal affine steering of LLM activations. Conceptors compress sets of activation vectors and act as soft projection matrices, enabling precise and interpretable control over internal states. Our framework derives optimal steering functions from first principles and consistently outperforms additive steering across in-context learning tasks and alignment-relevant behavior. We further demonstrate how Boolean operations over conceptors allow for compositional steering toward multiple objectives, yielding better performance than traditional vector combination methods. Together, these results establish conceptor-based steering as a powerful tool for both controlling LLM behavior and gaining insight into their internal mechanisms. We will release our code and data as part of a flexible open-source library for activation steering.

Supplementary Material: zip

Primary Area: Social and economic aspects of machine learning (e.g., fairness, interpretability, human-AI interaction, privacy, safety, strategic behavior)

Submission Number: 27939

Loading