Revisiting Unnaturalness for Automated Program Repair in the Era of Large Language Models

Published: 01 Jan 2025, Last Modified: 21 Jul 2025ICSE 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The problem of software quality has motivated the development of a variety of techniques for Automatic Program Repair (APR). Meanwhile, recent advances in AI and Large Language Models (LLMs) have produced orders of magnitude performance improvements over previous code generation techniques, affording promising opportunities for program repair and its constituent subproblems (e.g., fault localization, patch generation). Because models are trained on large volumes of code in which defects are relatively rare, they tend to both simultaneously perceive faulty code as unlikely (or “unnatural”) and to produce generally correct code (which is more “natural”). This paper comprehensively revisits the idea of (un)naturalness for program repair. We argue that, fundamentally, LLMs can only go so far on their own in reasoning about and fixing buggy code. This motivates the incorporation of traditional tools, which compress useful contextual and analysis information, as a complement to LLMs for repair. We interrogate the role of entropy at every stage of traditional repair, and show that it is indeed usefully complementary to classic techniques. We show that combining measures of naturalness with class Spectrum-Based Fault Localization (SBFL) approaches improves Top-5 scoring by 50 % over SBFL alone. We show that entropy delta, or change in entropy induced by a candidate patch, can improve patch generation efficiency by 24 test suite executions per repair, on average, on our dataset. Finally, we show compelling results that entropy delta for patch classification is highly effective at distinguishing correct from overfitting patches. Overall, our results suggest that LLMs can effectively complement classic techniques for analysis and transformation, producing more efficient and effective automated repair techniques overall.
Loading