SkillFactory: Self-Distillation for Learning Cognitive Behaviors

ICLR 2026 Conference Submission14888 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Models, Reasoning, Reinforcement Learning
TL;DR: We present SkillFactory, a pipeline for priming Language Models with cognitive reasoning skills that enhance reinforcement learning and improves downstream performance.
Abstract: Reasoning models leveraging long chains of thought employ various cognitive skills such as verification of their answers, backtracking, retrying by an alternate method, and more. Previous work has shown that when base models exhibit these skills, a reasoning model trained by reinforcement learning (RL) can learn to leverage it. How can we get models to leverage skills that aren't exhibited by base models? Our work, SkillFactory, is a method for fine-tuning models to roughly learn these skills during a supervised fine-tuning (SFT) stage prior to RL. Our approach does not rely on distillation from a stronger model, but instead uses samples from the model itself, rearranged to provide training data in the format of those skills. These "silver" SFT traces may contain errors, but are nevertheless effective for priming a model to acquire skills during RL. Our evaluation shows that (1) starting from SkillFactory initialization helps a model post-RL to generalize to harder variants of the task; (2) cognitive skills are indeed used by the model; (3) the presence of these skills allows for opportunities like budget forcing (driving a model to think longer) that our baselines lack.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 14888
Loading