On the Effectiveness of Trainable Steering Vectors in Supervised Fine-Tuning

On the Effectiveness of Trainable Steering Vectors in Supervised Fine-Tuning

ACL ARR 2026 January Submission10626 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: steering, reasoning, large language models, reinforcement learning, parameter-efficient fine-tuning

Abstract: Prior work showed that trainable steering vectors can match full-model reinforcement learning fine-tuning on mathematical reasoning benchmarks while updating only a tiny fraction of parameters. We test whether this equivalence extends to supervised fine-tuning and identify the factors that control it. Using Qwen2.5-Math-7B and LLaMa3.1-8B-Instruct trained on OpenThoughts-114k-math and evaluated on six math benchmarks, we find a consistent gap under SFT: steering vectors underperform full-model fine-tuning, unlike in the RL setting. We show that SFT-trained steering models deviate more from their base models, and that closing the gap by increasing adapter capacity requires full-rank updates (e.g., $\approx 27\%$ of parameters for Qwen2.5-Math-7B when scaling \texttt{MLP.down\_proj} LoRA). Finally, we show that changing the data removes the gap: when we distill from RL-trained teachers by training on selected positive generations, low-parameter steering vectors match full-tuning, without simply reproducing RL steering directions.

Paper Type: Short

Research Area: Mathematical, Symbolic, Neurosymbolic, and Logical Reasoning

Research Area Keywords: knowledge tracing/discovering/inducing, parameter-efficient-training, chain-of-thought, reasoning

Contribution Types: NLP engineering experiment, Reproduction study

Languages Studied: English

Submission Number: 10626

Loading