Keywords: AI Scientist, Synthetic Data, Agentic Task Synthesis, Coding Agents
TL;DR: We present a synthetic task generation system for training coding agents for machine learning tasks.
Abstract: With the advent of AI agents, automatic scientific discovery has become a tenable
goal. Many recent works scaffold agentic systems that can perform machine
learning research, but don’t offer a principled way to train such agents—and current
LLMs often generate plausible-looking but ineffective ideas. To make progress on
training agents that can learn from doing, we provide a novel synthetic environment
generation pipeline targeting machine learning agents. Our pipeline automatically
synthesizes machine learning challenges compatible with the SWE-agent Yang et al.
(2024) framework, covering topic sampling, dataset proposal, and code generation.
The resulting synthetic tasks are 1) grounded in real machine learning datasets,
because the proposed datasets are verified again the Huggingface API and are 2)
verified for higher quality with a self-debugging loop. To validate the effectiveness
of our synthetic tasks, we tackle MLGym (Nathani et al. (2025)), a benchmark for
machine learning tasks. From the synthetic tasks, we sample trajectories from
a teacher model (GPT-5 Singh et al. (2024)), then use the trajectories to train a
student model (Qwen3-4B and Qwen3-8B (Yang et al. (2025a))). The student models
trained with our synthetic tasks achieve improved performance on MLGym rasing
the AUP metric by 9% for Qwen3-4B and and 12% for Qwen3-8B.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 125
Loading