Instruction Following by Boosting Attention of Large Language Models

Published: 30 Sept 2025, Last Modified: 10 Nov 2025Mech Interp Workshop (NeurIPS 2025) SpotlightEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Steering, Understanding high-level properties of models
Other Keywords: steering, attention-based methods
TL;DR: We benchmark attention-based methods against latent steering for behavior control tasks, then propose a new attention-based inference-time intervention control method.
Abstract: Controlling the generation of large language models (LLMs) remains a central challenge to ensure they are both reliable and adaptable. Two common inference-time intervention approaches for this are instruction prompting, which provides natural language guidance, and latent steering, which directly modifies the model's internal activations to guide its behavior. Recently, attention manipulation methods have emerged that can enforce arbitrary user-provided instructions, representing a promising third approach for behavioral control. However, these methods have yet to be systematically compared against established approaches on complex behavioral tasks. Furthermore, existing methods suffer from critical limitations, requiring either computationally expensive head selection or, as we show, risk degrading generation quality by over-focusing on instructions. To address the evaluation gap, we establish a unified benchmark comparing low-resource intervention approaches across 15 diverse behavioral control tasks. To address the technical limitations, we introduce Instruction Attention Boosting (InstABoost), a simple and efficient method that multiplicatively boosts attention to instruction tokens, avoiding the trade-offs of prior work. On our benchmark, InstABoost consistently outperforms or is competitive with all baselines, establishing attention manipulation as a robust method for behavioral control that preserves generation quality.
Submission Number: 217
Loading