Critique-Guided Distillation for Efficient and Robust Language Model Reasoning

Critique-Guided Distillation for Efficient and Robust Language Model Reasoning

ICLR 2026 Conference Submission21500 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Knowledge Distillation, Critique, Iterative Refinement, Reasoning

TL;DR: A simple yet powerful extension to supervised fine-tuning via critiques that teaches models not only what the correct answer is but also why it is correct.

Abstract: Supervised fine-tuning (SFT) with expert demonstrations often suffers from the imitation problem, where models reproduce correct responses without internalizing the underlying reasoning. We propose $\text{C{\small RITIQUE-}G{\small UIDED} D{\small ISTILLATION} (CGD)}$, a multi-stage training framework that augments SFT with teacher-generated $\textit{explanatory critiques}$ and $\textit{refined responses}$. Instead of directly imitating teacher outputs, a student learns to map the triplet of prompt, its own initial response, and teacher critique into the refined teacher response, thereby capturing both $\textit{what}$ to output and $\textit{why}$. Our analyses show that $\text{CGD}$ consistently reduces refinement uncertainty, improves alignment between critiques and responses, and enhances sample efficiency. On reasoning benchmarks, $\text{CGD}$ achieves substantial gains across LLaMA and Qwen families, including +15.0\% on AMC23 and +12.2\% on MATH-500, while avoiding the format drift issues observed in prior critique-based fine-tuning. Importantly, on LLaMA-3.1-8B $\text{CGD}$ approaches or exceeds the performance of SimpleRL-Zero, which is a DeepSeek-R1 replication, while requiring 60x less compute. Beyond reasoning, $\text{CGD}$ maintains or improves general instruction-following and factual accuracy, matching baseline performance on IFEval, MUSR, TruthfulQA, and BBH. In contrast, prior critique-based methods degrade these capabilities (e.g., -21\% on IFEval). Taken together, these results establish $\text{CGD}$ as a robust and generalizable alternative to both conventional SFT and RL-based methods, offering a more efficient path toward advancing the reasoning and safety of large language models.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Submission Number: 21500

Loading