Learning with Interaction: Agentic Distillation for Large Language Model Reasoning

Learning with Interaction: Agentic Distillation for Large Language Model Reasoning

ICLR 2026 Conference Submission17783 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: large language models, distillation, reinforcement learning

TL;DR: We present a novel distillation method based on agentic interaction for LLM reasoning.

Abstract: Recent advancements in large language models (LLMs) have demonstrated remarkable reasoning abilities to solve complex tasks, which has propelled the progress toward artificial general intelligence (AGI). However, these gains come with significant computational costs, limiting their practical deployment. A promising direction is to distill reasoning skills from larger teacher models into smaller, more efficient student models, yet existing data-centric distillation approaches suffer from passive learning, over-learning on simple tasks, and persistent knowledge gaps. To overcome these limitations, we introduce Agentic Distillation, a novel framework for adaptive and active distillation. In Agentic Distillation, student LLMs interact with teacher LLMs modeled as environments, receiving feedback tokens to guide their reasoning process and selectively updating their capabilities when necessary. To address the off-policy and gradient vanishing challenges introduced by feedback tokens, we devise a tailored importance sampling and clipping strategy within a unified objective that both incentivizes reasoning and injects knowledge into student LLMs. Extensive experiments show that Agentic Distillation significantly enhances reasoning performance while improving efficiency, offering a scalable path for equipping compact LLMs with advanced reasoning abilities.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Submission Number: 17783

Loading