Learning to Solve Complex Problems via Dataset Decomposition

Wanru Zhao; Lucas Caccia; Zhengyan Shi; Minseon Kim; Weijia Xu; Alessandro Sordoni

Learning to Solve Complex Problems via Dataset Decomposition

Wanru Zhao, Lucas Caccia, Zhengyan Shi, Minseon Kim, Weijia Xu, Alessandro Sordoni

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: language models, decomposition, math reasoning

Abstract: Curriculum learning is a class of training strategies that organizes the data being exposed to a model by difficulty, gradually from simpler to more complex examples. This research explores a reverse curriculum generation approach that recursively decomposes complex datasets into simpler, more learnable components. We propose a teacher-student framework where the teacher is equipped with the ability to reason step-by-step, which is used to recursively generate easier versions of examples, enabling the student model to progressively master difficult tasks. We propose a novel scoring system to measure data difficulty based on its structural complexity and conceptual depth, allowing curriculum construction over decomposed data. Experiments on math datasets (MATH and AIME) and code generation datasets demonstrate that models trained with curricula generated by our approach exhibit superior performance compared to standard training on original datasets.

Supplementary Material: zip

Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)

Submission Number: 18043

Loading