Abstract: Chain-of-Thought (CoT) significantly enhances Large Language Models (LLMs) reasoning, but distilling complex CoT to small models remains challenging. Naive distillation often yields limited gains or even degrades performance in small models, likely due to their capacity constraints. Existing methods improving small model reasoning mainly rely on costly and impractical ground truth answers for data selection. Moreover, they lack a trade-off between performance and output length. To address this, we propose \textbf{MORALE}, a seg\textbf{M}ent-guided distillati\textbf{O}n framewo\textbf{R}k for sm\textbf{AL}l r\textbf{E}asoning models. MORALE enables small models to learn from complex CoT knowledge effectively, without requiring ground truth verification, maintaining high performance with remarkably short outputs. Specifically, MORALE divides reasoning trajectories into independent segments complemented by a summary, making it \textit{dataset-agnostic}. An integrated RS2DPO module further boosts model potential while keeping thinking concise. Extensive experiments demonstrate that MORALE substantially improves small model reasoning performance, achieving an average gain of 36.93\%, while simultaneously reducing output length by 65.86\% compared to conventional long CoT distillation.
Paper Type: Long
Research Area: Question Answering
Research Area Keywords: chain-of-thought, reasoning, generalization
Contribution Types: Approaches to low-resource settings, Data analysis
Languages Studied: English
Submission Number: 6121
Loading