Rethinking the Generation of High-Quality CoT Data from the Perspective of LLM-Adaptive Question Difficulty Grading

Rethinking the Generation of High-Quality CoT Data from the Perspective of LLM-Adaptive Question Difficulty Grading

ACL ARR 2025 May Submission1901 Authors

18 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Recently, DeepSeek-R1 (671B) has demonstrated its excellent reasoning ability in complex tasks and has publicly shared its methodology.This provides potentially high-quality chain-of-thought (CoT) data for stimulating the reasoning abilities of small-sized large language models (LLMs). To generate high-quality CoT data for different LLMs, we seek an efficient method for generating high-quality CoT data with LLM-Adaptive question difficulty levels. First, we grade the difficulty of the questions according to the reasoning ability of the LLMs themselves and construct an LLM-Adaptive question database. Second, we sample the problem database based on a distribution of difficulty levels of the questions and then use DeepSeek-R1 (671B) to generate the corresponding high-quality CoT data with correct answers.Thanks to the construction of CoT data with LLM-Adaptive difficulty levels, we have significantly reduced the cost of data generation and enhanced the efficiency of model supervised fine-tuning (SFT).Finally, we have validated the effectiveness and generalizability of the proposed method in the fields of complex mathematical competitions and code generation tasks. Notably, with only 2k high-quality mathematical CoT data, our ZMath-32B surpasses DeepSeek-Distill-32B in math reasoning task. Similarly, with only 2k high-quality code CoT data, our ZCode-32B surpasses DeepSeek-Distill-32B in code reasoning tasks. Our ZMath-32B LLM also outperforms both the DeepSeek-Distill-32B and QwQ-32B models on general evaluation benchmarks.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: Resources and Evaluation,Question Answering,Efficient/Low-Resource Methods for NLP,Human-Centered NLP

Contribution Types: Approaches to low-resource settings, Publicly available software and/or pre-trained models, Data resources

Languages Studied: English

Submission Number: 1901

Loading