RepoAudit: An Autonomous LLM-Agent for Repository-Level Code Auditing

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: LLM-powered code auditing technique for detecting security vulnerabilities at the repository level
Abstract: Code auditing is the process of reviewing code with the aim of identifying bugs. Large Language Models (LLMs) have demonstrated promising capabilities for this task without requiring compilation, while also supporting user-friendly customization. However, auditing a code repository with LLMs poses significant challenges: limited context windows and hallucinations can degrade the quality of bug reports, and analyzing large-scale repositories incurs substantial time and token costs, hindering efficiency and scalability. This work introduces an LLM-based agent, RepoAudit, designed to perform autonomous repository-level code auditing. Equipped with agent memory, RepoAudit explores the codebase on demand by analyzing data-flow facts along feasible program paths within individual functions. It further incorporates a validator module to mitigate hallucinations by verifying data-flow facts and checking the satisfiability of path conditions associated with potential bugs, thereby reducing false positives. RepoAudit detects 40 true bugs across 15 real-world benchmark projects with a precision of 78.43%, requiring on average only 0.44 hours and $2.54 per project. Also, it detects 185 new bugs in high-profile projects, among which 174 have been confirmed or fixed. We have open-sourced RepoAudit at https://github.com/PurCL/RepoAudit.
Lay Summary: Modern software often contains subtle bugs that can crash systems or create security vulnerabilities. Manually reviewing millions of lines of code is difficult, and current AI tools often struggle with large files or generate false alarms. We developed RepoAudit, an AI assistant that analyzes codebases by tracking how values move through different execution paths. It builds a memory of these paths and uses that context to ask, “Could this value cause a problem?” Before reporting a potential issue, it runs a built-in check to reduce false positives. In evaluations on 15 widely used open-source projects, RepoAudit identified 40 real bugs with nearly 80% precision, at an average cost of less than a cup of coffee per project. On larger, high-profile codebases, it reported 185 new issues, 174 of which have already been confirmed or fixed by developers. By catching issues early, RepoAudit helps save time and reduce the cost and risk of bugs. It's free, open-source, and ready to use.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Link To Code: https://github.com/PurCL/RepoAudit
Primary Area: Applications
Keywords: agent, code reasoning, code auditing, bug detection
Submission Number: 5175
Loading