Infinite Problem Generator: Verifiably Scaling Physics Reasoning Data with Agentic Workflows

ACL ARR 2026 January Submission5475 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: synthetic data, physics reasoning, agentic workflows, executable verification, large language models
Abstract: Training large language models for complex reasoning is bottlenecked by the scarcity of verifiable, high-quality data. In domains such as physics, standard text augmentation often introduces hallucinations, while static benchmarks lack the reasoning traces required for fine-tuning. We introduce the Infinite Problem Generator (IPG), an agentic framework that synthesizes physics problems while enforcing solvability through a "Formula-as-Code'' paradigm. Unlike probabilistic text generation, IPG constructs solutions as executable Python programs, enforcing strict mathematical consistency. As a proof-of-concept, we release ClassicalMechanicsV1, a high-fidelity corpus of 1,335 classical mechanics problems expanded from 165 expert seeds. The corpus demonstrates high structural diversity, spanning 102 unique physical formulas with an average complexity of 3.05 formulas per problem. Furthermore, we identify a "Complexity Blueprint'', demonstrating a strong linear correlation ($R^2 \approx 0.95$) between formula count and verification code length. This relationship establishes code complexity as a precise, proxy-free metric for problem difficulty, enabling controllable curriculum generation. We release the full IPG pipeline, the ClassicalMechanicsV1 dataset, and our evaluation tools to support reproducible research in reasoning-intensive domains.
Paper Type: Long
Research Area: Mathematical, Symbolic, Neurosymbolic, and Logical Reasoning
Research Area Keywords: logical reasoning, math QA, mathematical NLP, chain-of-thought, neurosymbolic approaches, reasoning, corpus creation, data augmentation, evaluation methodologies
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources, Data analysis
Languages Studied: English
Submission Number: 5475
Loading