CoT-Self-Instruct: Building high-quality synthetic prompts  data  for reasoning and non-reasoning tasks

CoT-Self-Instruct: Building high-quality synthetic prompts data for reasoning and non-reasoning tasks

ICLR 2026 Conference Submission21005 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: synthetic data.+chain of thoughts.+self-instruct

TL;DR: We propose CoT-Self-Instruct, a new synthetic data creation + curation pipeline that leverages LLM's planning and reasoning capability.

Abstract: We propose CoT-Self-Instruct, a synthetic data generation method that instructs LLMs to first reason and plan via Chain-of-Thought (CoT) based on given seed tasks, and then generate a new synthetic example of similar quality and complexity. This is followed by a filtering step to select high-quality data using automatic metrics, which are then used for LLM training. In verifiable reasoning, our synthetic data significantly outperforms existing training datasets, such as s1k and OpenMathReasoning, when evaluated on MATH500, AMC23, AIME24, and GPQA-Diamond. For non-verifiable instruction-following tasks, our method surpasses the performance of both human and standard Self-Instruct training data on the AlpacaEval 2.0 and Arena-Hard benchmarks.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 21005

Loading