QueST: Incentivizing LLMs to Generate Difficult Problems

Hanxu Hu; Xingxing Zhang; Jannis Vamvas; Rico Sennrich; Furu Wei

QueST: Incentivizing LLMs to Generate Difficult Problems

Hanxu Hu, Xingxing Zhang, Jannis Vamvas, Rico Sennrich, Furu Wei

19 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: reasoning, code generation, large language model

Abstract: Large Language Models have achieved strong performance on reasoning tasks, solving competition-level coding and math problems. However, their scalability is limited by human-labeled datasets and the lack of large-scale, challenging coding problem training data. Existing competitive coding datasets contain only thousands to tens of thousands of problems. Previous synthetic data generation methods rely on either augmenting existing instruction datasets or selecting challenging problems from human-labeled data. In this paper, we propose QueST, a novel framework which combines difficulty-aware graph sampling for prompt and difficulty-aware rejection fine-tuning that directly optimizes specialized generators to create challenging coding problems. Our trained generators demonstrate superior capability at creating challenging problems compared to even proprietary models such as GPT-4o. We leverage this method to generate large-scale synthetic coding problems, which we then use to distill from long Chain-of-Thought (CoT) models or conduct reinforcement learning for smaller models, proving effective in both scenarios. Our distilled model achieves the best performance compared to similarly sized models trained on previous long CoT SFT datasets. By training generators to create more difficult problems, QueST pushes the boundaries of reasoning abilities in large language models.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 21237

Loading