Keywords: Test Time Scaling, Reasoning Steering, Vector Steering, Sparse Autoencoders, LLM Reasoning
Abstract: A common Test-Time Scaling (TTS) strategy for Large Language Models (LLMs) reasoning is allocating additional computation during inference to generate longer Chains-of-Thoughts (CoTs).
However, simply scaling CoT length often introduces redundancy and aimless exploration, which can paradoxically degrade performance.
We propose that effective TTS requires a shift from merely lengthening reasoning to actively steering reasoning trajectory, thereby directing additional computation toward productive reasoning.
To this end, we propose SAE-Scaling, a framework for fine-grained control over an LLM's reasoning trajectory.
SAE-Scaling first employs Sparse Autoencoders
to identify and disentangle interpretable features associated with five key reasoning strategies: *Problem Understanding*, *Procedural Planning*, *Backtracking*, *Multi-perspective Verification*, and *Hypothesis Reasoning*.
Next, we train a lightweight strategy router that dynamically chooses a reasoning strategy at each step of the reasoning trajectory.
By actively manipulating the strategy-specific feature during generation, SAE-Scaling steers the CoT to follow a target reasoning strategy, thereby channeling the additional computation
to more productive reasoning.
Experiments on three LLMs across three challenging reasoning benchmarks show a 68\% average success rate in controlling reasoning strategies alongside an average absolute accuracy gain of 3.6\% over the vanilla baseline, highlighting the effectiveness of SAE-Scaling.
Primary Area: interpretability and explainable AI
Submission Number: 18408
Loading