Instruction Following by Principled Attention Boosting of Large Language Models

Published: 02 Mar 2026, Last Modified: 02 Mar 2026Sci4DL 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: instruction following, attention steering, inference-time intervention, attention steering theory, LLM safety
TL;DR: We develop a theory for how attention steering improves instruction following, then use it to propose InstABoost, a simple inference-time method that improves instruction following without hurting generation quality.
Abstract: Large language models' behavior is often shaped by instructions such as system prompts, refusal boundaries, privacy constraints, and tool-use rules that must hold at inference time. One such training-free intervention is attention steering, which biases attention toward instruction tokens. In this work, we present a theoretical formalization of instruction following as rule-based competition between instruction rules and context-derived rules, with attention mediating which rules dominate, unifying existing attention-steering methods. We prove that boosting attention to instruction tokens tilts this competition, making it harder for context to override instruction-following. However, excessive boosting can suppress task-relevant context that should be incorporated alongside the instruction. Guided by this theory, we propose Instruction Attention Boosting (\ourmethod), a simple intervention that applies a constant additive bias to instruction-key attention logits uniformly.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Style Files: I have used the style files.
Submission Number: 60
Loading