Fixpad++: Automated Bug Fix Verification Using LLM Agents

Published: 28 Mar 2026, Last Modified: 28 Mar 2026AIware 2026EveryoneRevisionsCC BY 4.0
Keywords: Automated Bug Fix Verification, Large Language Models, Agents, Software Testing, Automated Patch Correctness Assessment, Auto- mated Testing for Desktop Applications, Execution-based Bug Fix Verification
TL;DR: Fixpad++ uses a multi-modal LLM agents to automatically reproduce crash bugs in GUI-based desktop apps and verify whether patches actually fix them, achieving 87.5% accuracy in verifying fixes on real Notepad++ bugs.
Abstract: Verifying bug fixes before patches are released to end users is a critical step in the software development lifecycle. However, this process is often manual, repetitive, and error-prone, especially for crash bugs triggered through Graphical User Interface (GUI) interactions in desktop applications. Despite recent advancements in LLM-driven software agents, existing work primarily targets bug reproduction without addressing fix verification, while approaches that do focus on verification rely on source code access, making them inapplicable to closed-source GUI-based desktop applications. This paper introduces Fixpad++, a framework designed to automatically verify bug fixes in the Notepad++ desktop application using LLM-powered agents. Fixpad++ employs a two-phase approach: first, a multi-modal multi-agent system interacts with the buggy version to reproduce the reported crash using visual parsing and LLM reasoning. Second, upon successful reproduction, a trajectory replay mechanism executes the recorded action sequence on the patched version to validate the fix. We evaluated Fixpad++ on FixPad-Bench, a new dataset of 105 evaluation instances derived from 22 real-world Notepad++ crash bugs, including valid and invalid patches. The system achieved a reproduction success rate of 72.73\% with an average time of 174.07 seconds. Among the successfully reproduced cases, Fixpad++ correctly verified valid fixes with 87.50\% accuracy and detected invalid fixes with 77.05\% accuracy, outperforming OpenAI’s Computer-Using Agent (CUA). Fixpad++ demonstrates the effectiveness of specialized LLM agent architectures for automated bug fix verification in GUI-based desktop applications, offering a practical solution for automating verification workflows without requiring access to source code.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public.
Paper Type: Full-length papers (i.e. case studies, theoretical, applied research papers). 8 pages
Reroute: false
Submission Number: 17
Loading