DA-CoTD: Efficient Chain-of-Thought Reasoning with Difficulty-Aware CoT-Distillation

Published: 16 Oct 2025, Last Modified: 10 Nov 2025NeurIPS 2025 ER WorkshopEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Reasoning Models, Efficiency, Distillation, Chain-of-thought
TL;DR: Reducing the tokens count of reasoning traces by compressing them according to the difficulty of the problems.
Abstract: Chain-of-thought (CoT) prompting improves reasoning in large language models (LLMs) but often produces overly verbose traces, leading to inefficiency at inference time. This issue is amplified in multimodal reasoning, where images require greater token budget and simple problems require little reasoning while complex ones demand detailed cross-modal chains. We propose \textit{Difficulty-Aware CoT Distillation} (DA-CoTD), a framework that adapts reasoning length to input complexity. Using an LLM-based grader aligned with AoPS difficulty ratings, we compress verbose CoT traces into difficulty-aligned ones and fine-tune multimodal models via supervised fine-tuning (SFT) and direct preference optimization (DPO). Experiments on seven multimodal math benchmarks show that DA-CoTD reduces reasoning tokens by up to 30\% while maintaining or improving accuracy, outperforming strong baselines.
Submission Number: 110
Loading