CoT-Self-Instruct: Building high-quality synthetic prompts data for reasoning and non-reasoning tasks
Keywords: synthetic data.+chain of thoughts.+self-instruct
TL;DR: We propose CoT-Self-Instruct, a new synthetic data creation + curation pipeline that leverages LLM's planning and reasoning capability.
Abstract: We propose CoT-Self-Instruct, a synthetic data generation method that instructs LLMs to first reason and plan via Chain-of-Thought (CoT) based on given seed tasks, and then generate a new synthetic example of similar quality and complexity. This is followed by a filtering step to select high-quality data using automatic metrics, which are then used for LLM training. In verifiable reasoning, our synthetic data significantly outperforms existing training datasets, such as s1k and OpenMathReasoning, when evaluated on MATH500, AMC23, AIME24, and GPQA-Diamond. For non-verifiable instruction-following tasks, our method surpasses the performance of both human and standard Self-Instruct training data on the AlpacaEval 2.0 and Arena-Hard benchmarks.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 21005
Loading