TL;DR: In this paper, we propose PatchPilot, an agentic patcher that strikes a balance between patching efficacy, stability, and cost-efficiency.
Abstract: Recent research builds various patching agents that combine large language models (LLMs) with non-ML tools and achieve promising results on the state-of-the-art (SOTA) software patching benchmark, SWE-bench.
Based on how to determine the patching workflows, existing patching agents can be categorized as agent-based planning methods, which rely on LLMs for planning, and rule-based planning methods, which follow a pre-defined workflow.
At a high level, agent-based planning methods achieve high patching performance but with a high cost and limited stability.
Rule-based planning methods, on the other hand, are more stable and efficient but have key workflow limitations that compromise their patching performance.
In this paper, we propose PatchPilot, an agentic patcher that strikes a balance between patching efficacy, stability, and cost-efficiency.
PatchPilot proposes a novel rule-based planning workflow with five components: reproduction, localization, generation, validation, and refinement (where refinement is unique to PatchPilot).
We introduce novel and customized designs to each component to optimize their effectiveness and efficiency.
Through extensive experiments on the SWE-bench benchmarks, PatchPilot shows a superior performance than existing open-source methods while maintaining low cost (less than 1\$ per instance) and ensuring higher stability.
We also conduct a detailed ablation study to validate the key designs in each component.
Our code is available at https://github.com/ucsb-mlsec/PatchPilot.
Lay Summary: Automatically fixing software bugs remains challenging: AI-driven tools can generate code patches but often sacrifice either cost, reliability, or effectiveness. We introduce PatchPilot, which guides bug repair through a clear five-step process—reproducing the error, pinpointing its location, generating a fix, validating the result, and refining the solution—to deliver accurate patches while keeping costs below $1 per bug. On the SWE-bench benchmark, PatchPilot outperforms existing open-source methods with greater consistency and affordability, offering developers a faster, more reliable way to maintain healthy code.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Link To Code: https://github.com/ucsb-mlsec/PatchPilot
Primary Area: Applications->Everything Else
Keywords: large language model, automatic program repair, autonomous software improvement, autonomous software engineering
Submission Number: 8862
Loading