MLE-RL: Reinforcement Learning for Self-Improvement in Machine Learning Agents

MLE-RL: Reinforcement Learning for Self-Improvement in Machine Learning Agents

ICLR 2026 Conference Submission17331 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large language Model, Reinforcement learning, Coding Agent, Self-improvement

Abstract: Language models have shown significant promise in complex reasoning and coding tasks. However, coding for machine learning engineering presents unique challenges due to the iterative nature of development, long execution times, and the need for continuous self-improvement. In this paper, we introduce MLE-RL trained with reinforcement learning to address these challenges. Our approach reframes the learning process by breaking down long-horizon trajectories into single-step optimizations. We employ a reinforcement learning strategy that selectively learns from the most informative attempts, optimizing the policy on valuable steps. In addition, to overcome context limitations, our agent uses a scaffold with a memory module to store and recall high-performing past solutions, facilitating cumulative learning. The evaluation on the MLE-Bench demonstrates that our MLE-RL-32B achieves 4.9% improvement over the baseline model in the competition ranking on ML tasks and achieves competitive performance against state-of-the-art open-source models like DeepSeek-R1-0528.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 17331

Loading