BugPilot: Complex Bug Generation for Efficient Learning of SWE Skills

Atharv Sonwane; Isadora White; Hyunji Lee; Matheus Pereira; Lucas Caccia; Minseon Kim; Zhengyan Shi; Chinmay Singh; Alessandro Sordoni; Marc-Alexandre Côté; Xingdi Yuan

BugPilot: Complex Bug Generation for Efficient Learning of SWE Skills

Atharv Sonwane, Isadora White, Hyunji Lee, Matheus Pereira, Lucas Caccia, Minseon Kim, Zhengyan Shi, Chinmay Singh, Alessandro Sordoni, Marc-Alexandre Côté, Xingdi Yuan

Published: 03 Mar 2026, Last Modified: 03 Mar 2026SPOTEveryoneRevisionsBibTeXCC BY 4.0

Keywords: agents, coding agents, synthetic data, swe agents, large language models, agent training, supervised fine tuning

TL;DR: Training SWE-agents using synthetically generated bugs collected from mistakes made by agents asked to add features.

Abstract: Training the next generation of language model based software engineering (SWE) agents requires bug-fixing data sourced from high-quality bugs. We introduce a novel method for the synthetic generation of difficult and diverse bugs. Our method instructs SWE Agents to introduce a feature into the codebase and collects unintentionally buggy changes caught by test failures. Prior approaches focus on generating bugs intentionally (e.g., local perturbation to existing code) and are unreflective of realistic development processes, yielding out-of-distribution bugs. Qualitative analysis demonstrates that our approach for generating bugs more closely reflects the patterns found in human-authored edits. Experiments show that our bugs provide more efficient training data for supervised fine-tuning, outperforming models trained on a bug dataset of pre-existing bugs from SWE-Smith and R2E-Gym by over 4%. Finally, training on our newly generated bugs in addition to existing bug datasets results in FrogBoss, a state-of-the-art 32B model on SWE-Bench Verified with a pass@1 of 54.6% averaged over three seeds. Our FrogBoss recipe generalizes across model sizes, demonstrated by FrogMini and FrogBeast, with 14B and 235B parameters with SWE-Bench Verified pass@1 of 45.3% and 61.0% respectively.

Submission Number: 79

Loading