BigIssue: A Realistic Bug Localization Benchmark

Anonymous

BigIssue: A Realistic Bug Localization Benchmark

Anonymous

16 Dec 2022 (modified: 05 May 2023)ACL ARR 2022 December Blind SubmissionReaders: Everyone

Abstract: As machine learning tools progress, the inevitable question arises: How can machine learning help us write better code? With significant progress being achieved in natural language processing with models like GPT-3 and BERT, the applications of natural language processing techniques to code is starting to be explored. Most research has been focused on automatic program repair (APR), and while the results on synthetic or highly filtered datasets are promising, such models are hard to apply in real-world scenarios because of underperfoming bug localization techniques. We propose BigIssue: a realistic bug localization benchmark. The goal of the benchmark is two-fold. We provide (1) a general benchmark with a diversity of real and synthetic Java bugs and (2) a motivation to improve bug localization capabilities of models through attention to the full repository context. With the introduction of BigIssue, we hope to advance the state of the art in bug localization, in turn improving APR performance and increase its applicability to the modern development cycle.

Paper Type: long

Research Area: NLP Applications

0 Replies

Loading