Abstract: We present SWE-Gym, the first environment for training real-world software engineering (SWE) agents. SWE-Gym contains 2,438 real-world Python task instances, each comprising a codebase with an executable runtime environment, unit tests, and a task specified in natural language. We use SWE-Gym to train language model based SWE agents, achieving up to 19% absolute gains in resolve rate on the popular SWE-Bench Verified and Lite test sets. We also experiment with inference-time scaling through verifiers trained on agent trajectories sampled from SWE-Gym. When combined with our fine-tuned SWE agents, we achieve 32.0% and 26.0% on SWE-Bench Verified and Lite, respectively, reflecting a new state-of-the-art for open-weight SWE agents. To facilitate further research, we publicly release SWE-Gym, models, and agent trajectories.
Lay Summary: Software engineering is increasingly being assisted by AI, promising to make coding faster and more accessible. But training these AI “engineers” is hard: they need a rich environment where they can learn by solving real tasks in real codebases—something past tools haven’t offered. That’s where our work, SWE-Gym, comes in.
SWE-Gym is a new “training ground” built from 2,438 real-world GitHub issues. Each task comes with a full Python project, unit tests that check the AI’s work, and a clear problem description. Using this setup, we trained two AI models that work together: one proposes code fixes, and another evaluates them. This duo already solves about one-third of the hardest bugs in a standard test set—better than any open AI system to date—and it gets smarter with more practice.
By releasing SWE-Gym, along with our code, models, and results, we hope to jumpstart research into trustworthy, open, and reproducible AI software agents that can assist in real-world programming.
Link To Code: https://github.com/SWE-Gym/SWE-Gym
Primary Area: Deep Learning->Large Language Models
Keywords: Agents, Software Engineering Agents, Post-training
Submission Number: 9345
Loading