Beyond Static Corpora,  Toward Adaptive, Specification-Driven Program Repair Evaluation

Beyond Static Corpora, Toward Adaptive, Specification-Driven Program Repair Evaluation

16 Feb 2026 (modified: 05 May 2026)ACM AIWare 2026 Conference SubmissionEveryoneRevisionsvalue

Keywords: Automated Program Repair, Adaptive Benchmarks, Evaluation Methodology, Large Language Models, Benchmark Generation

TL;DR: Static benchmarks for program repair are structurally broken and cannot be fixed by making them bigger — we propose replacing them with adaptive, specification-driven pipelines that generate infinite, verified repair instances on demand.

Abstract: Automated Program Repair (APR) has rapidly advanced with the emergence of Large Language Models (LLMs), and modern repair systems increasingly achieve high success rates on established benchmarks — raising concerns about evaluation saturation and distributional overfitting. This paper argues that the dominant paradigm of static benchmark evaluation is structurally inadequate, and that scaling static datasets cannot resolve this inadequacy. We propose a paradigm shift toward adaptive, specification-driven benchmark generation governed by five organizing principles: generative unboundedness, specification primacy, deterministic certification, adaptive coverage, and oracle independence. We formalize these principles, develop a taxonomy of the dimensions along which repair instances vary, and argue that the generator–verifier separation is the architectural consequence that makes the framework trustworthy. We treat the oracle problem as a conceptual issue in its own right, examine the tradeoffs among available correctness criteria, and identify the open problems the framework surfaces but does not resolve.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public.

Paper Type: Full-length papers (i.e. case studies, theoretical, applied research papers). 8 pages

Reroute: false

Submission Number: 55

Loading