Keywords: large language models, distillation, reinforcement learning
TL;DR: We present a novel distillation method based on agentic interaction for LLM reasoning.
Abstract: Recent advancements in large language models (LLMs) have demonstrated remarkable reasoning abilities to solve complex tasks, which has propelled the progress toward artificial general intelligence (AGI). However, these gains come with significant computational costs, limiting their practical deployment. A promising direction is to distill reasoning skills from larger teacher models into smaller, more efficient student models, yet existing data-centric distillation approaches suffer from passive learning, over-learning on simple tasks, and persistent knowledge gaps. To overcome these limitations, we introduce Agentic Distillation, a novel framework for adaptive and active distillation. In Agentic Distillation, student LLMs interact with teacher LLMs modeled as environments, receiving feedback tokens to guide their reasoning process and selectively updating their capabilities when necessary. To address the off-policy and gradient vanishing challenges introduced by feedback tokens, we devise a tailored importance sampling and clipping strategy within a unified objective that both incentivizes reasoning and injects knowledge into student LLMs. Extensive experiments show that Agentic Distillation significantly enhances reasoning performance while improving efficiency, offering a scalable path for equipping compact LLMs with advanced reasoning abilities.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 17783
Loading