Focus on This, Not That! Steering LLMs with Adaptive Feature Specification

Published: 01 Jul 2025, Last Modified: 07 Jul 2025ICML 2025 R2-FM Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Instruction Tuning, LLMs, Spurious Correlations
TL;DR: We introduce Focus Instruction Tuning (FIT), a method to trains LLMs to adaptively condition their task behaviours based on specified features, inducing steerability and controllability through feature specification.
Abstract: Despite the success of Instruction Tuning (IT) in training large language models (LLMs), such models often leverage spurious or biased features learnt from their training data and can become misaligned, leading to undesired behaviours. While existing techniques can steer model behaviour at inference-time, they are often post-hoc and do not embed steering as an intrinsic model feature. In this work, we introduce Focus Instruction Tuning (FIT), which trains LLMs to condition their responses by focusing on specific features whilst ignoring others, leading to different behaviours based on what features are specified. Across diverse benchmarks, we demonstrate that FIT: (i) steers behaviour at inference time; (ii) increases robustness by amplifying core task signals and down-weighting spurious cues; (iii) mitigates social bias by suppressing demographic attributes; and (iv) generalises to distribution shifts and previously unseen focus features. FIT therefore offers a lightweight, intrinsic mechanism for building robust, fair, and easily controllable LLMs suitable for real-world deployment.
Submission Number: 60
Loading