Keywords: Efficient reasoning models, sparse autoencoders
Abstract: How cost-effectively can we elicit strong reasoning abilities in language models by leveraging their underlying representations? We present Resa, a family of reasoning models trained via an efficient sparse autoencoder tuning (SAE‑Tuning) procedure. This method first trains an SAE to capture reasoning abilities from a source model, and then uses the trained SAE to guide a standard supervised fine-tuning process to elicit such abilities in a target model, all using verified question-answer data without any reasoning traces. When applied to certain Qwen-style models before further RL training, SAE‑Tuning retains 97\% of its RL‑trained counterpart’s performance while reducing training costs by 2000x to roughly \$1 and training time by 450x to around 20 minutes. Furthermore, even at the 1.5B model size, SAE-Tuning on lightly RL-trained models delivers strong reasoning results, reaching 43.33\% Pass@1 on AIME24 and 90\% Pass@1 on AMC23. We also show that SAE-Tuning works for Llama-style models, boosting their scores by over 10\% on tasks like AMC23 and MATH500. Surprisingly, the reasoning abilities extracted via SAEs are potentially both generalizable and modular. Generality means abilities extracted from one dataset still elevate performance on a larger and overlapping corpus. Modularity means abilities extracted from models like Qwen or Qwen‑Math can be attached to the R1‑Distilled Qwen model at test time, without any retraining, and yield comparable gains. Extensive ablations validate these findings and all artifacts are fully open-sourced.
Submission Number: 125
Loading