MORALE: Segment-Guided Distillation Framework for Small Reasoning Models

MORALE: Segment-Guided Distillation Framework for Small Reasoning Models

ACL ARR 2025 May Submission6121 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Chain-of-Thought (CoT) significantly enhances Large Language Models (LLMs) reasoning, but distilling complex CoT to small models remains challenging. Naive distillation often yields limited gains or even degrades performance in small models, likely due to their capacity constraints. Existing methods improving small model reasoning mainly rely on costly and impractical ground truth answers for data selection. Moreover, they lack a trade-off between performance and output length. To address this, we propose \textbf{MORALE}, a seg\textbf{M}ent-guided distillati\textbf{O}n framewo\textbf{R}k for sm\textbf{AL}l r\textbf{E}asoning models. MORALE enables small models to learn from complex CoT knowledge effectively, without requiring ground truth verification, maintaining high performance with remarkably short outputs. Specifically, MORALE divides reasoning trajectories into independent segments complemented by a summary, making it \textit{dataset-agnostic}. An integrated RS2DPO module further boosts model potential while keeping thinking concise. Extensive experiments demonstrate that MORALE substantially improves small model reasoning performance, achieving an average gain of 36.93\%, while simultaneously reducing output length by 65.86\% compared to conventional long CoT distillation.

Paper Type: Long

Research Area: Question Answering

Research Area Keywords: chain-of-thought, reasoning, generalization

Contribution Types: Approaches to low-resource settings, Data analysis

Languages Studied: English

Submission Number: 6121

Loading