DynaGuide: Efficient Reasoning Capability Distillation via Adaptive Data Curation and Overthinking Mitigation
Abstract: The exponentially increasing computational demands of large language models (LLMs) facilitate the distillation to small models. Existing distillation attempts to transfer LLMs' reasoning capabilities to compact models face critical limitations: expensive training or annotation cost, suboptimal data selection, and flawed synthetic data due to LLMs' general overthinking behaviors. This paper introduces DynaGuide, a novel framework that optimizes the distillation process in both efficiency and performance. Our approach integrates (1) Dynamic Data Selection that adaptively performs fine-grained valuable data selection during the training process, and (2) Reasoning Pattern Guidance that mitigates the overthinking problem in synthetic data by incorporating specialized guidance during fine-tuning. Extensive experiments demonstrate that DynaGuide enables a 7B parameter model to achieve superior performance on knowledge reasoning question answering benchmarks, even achieving or exceeding its 32B counterpart. Our systematic ablation studies and analysis further reveal insights into distillation and reasoning.
Paper Type: Long
Research Area: Efficient/Low-Resource Methods for NLP
Research Area Keywords: distillation, data-efficient training
Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency
Languages Studied: English
Submission Number: 4771
Loading