Controllable Test-Time Scaling via Sparse Autoencoder‑Based Reasoning Steering

ICLR 2026 Conference Submission18408 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Test Time Scaling, Reasoning Steering, Vector Steering, Sparse Autoencoders, LLM Reasoning
Abstract: A common Test-Time Scaling (TTS) strategy for Large Language Models (LLMs) reasoning is allocating additional computation during inference to generate longer Chains-of-Thoughts (CoTs). However, simply scaling CoT length often introduces redundancy and aimless exploration, which can paradoxically degrade performance. We propose that effective TTS requires a shift from merely lengthening reasoning to actively steering reasoning trajectory, thereby directing additional computation toward productive reasoning. To this end, we propose SAE-Scaling, a framework for fine-grained control over an LLM's reasoning trajectory. SAE-Scaling first employs Sparse Autoencoders to identify and disentangle interpretable features associated with five key reasoning strategies: *Problem Understanding*, *Procedural Planning*, *Backtracking*, *Multi-perspective Verification*, and *Hypothesis Reasoning*. Next, we train a lightweight strategy router that dynamically chooses a reasoning strategy at each step of the reasoning trajectory. By actively manipulating the strategy-specific feature during generation, SAE-Scaling steers the CoT to follow a target reasoning strategy, thereby channeling the additional computation to more productive reasoning. Experiments on three LLMs across three challenging reasoning benchmarks show a 68\% average success rate in controlling reasoning strategies alongside an average absolute accuracy gain of 3.6\% over the vanilla baseline, highlighting the effectiveness of SAE-Scaling.
Primary Area: interpretability and explainable AI
Submission Number: 18408
Loading