GREPO: A Benchmark for Graph Neural Networks on Repository-Level Bug Localization

16 Sept 2025 (modified: 14 Feb 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Code Agent, Graph
Abstract: Repository-level bug localization, the task of identifying where code must be modified to fix a bug, is a critical software engineering challenge. Standard Large Language Models (LLMs) are often unsuitable for this task due to context window limitations that prevent them from processing entire code repositories. Moreover, the intricate dependencies between code entities mean that bug localization often requires multi-hop reasoning across the repository's structure. Existing approaches typically treat this as an information retrieval (IR) problem, relying on heuristics like keyword matching and text similarity. While some methods incorporate repository graph structures, they often employ simplistic traversal algorithms (e.g., Breadth-First Search). Graph Neural Networks (GNNs) present a promising alternative with their inherent capacity to model complex, repository-wide dependencies, but the absence of a dedicated benchmark has hindered their application. To bridge this gap, we introduce GREPO, the first benchmark designed for repository-scale bug localization using GNNs. It comprises 109 Python repositories and over 10,000 bug-fixing pull requests, offering graph-based data structures ready for direct GNN processing. Our evaluation of various GNN architectures on a representative subset of 9 repositories in GREPO reveals their competitive performance against established information retrieval baselines. This work demonstrates the strong potential of GNNs for this task and establishes GREPO as a foundational resource for future research. Our code can be found at https://anonymous.4open.science/status/RepoGNN-57C0.
Primary Area: learning on graphs and other geometries & topologies
Submission Number: 7758
Loading