Code Reffix: A Benchmark for Reflection-Guided Code Repair with Large Language Models

Code Reffix: A Benchmark for Reflection-Guided Code Repair with Large Language Models

ACL ARR 2026 January Submission8553 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Code Repair, Large Language Models

Abstract: While recent studies have increasingly emphasized the role of reflection in code repair tasks, existing benchmarks still target the repair generation capability of LLMs, lacking fine-grained evaluation of reflection generation capability. To this end, we propose Code Reffix, a benchmark featuring an automated pipeline with oracle reflections and a dual-task protocol to decouple the evaluation of reflection from repair. Through extensive experiments on 14 LLMs and fine-tuning analysis, we aim to pinpoint performance bottlenecks of code repair, quantify reflection quality, and verify the value of reflection optimization. Evaluations reveal that underperformed reflection capabilities remain a major bottleneck for code repair. By quantifying this gap, Code Reffix provides a critical foundation for optimizing LLMs to achieve superior repair performance.

Paper Type: Long

Research Area: Code Models

Research Area Keywords: Code Models, Language Modeling

Contribution Types: Model analysis & interpretability, Publicly available software and/or pre-trained models, Data resources, Data analysis

Languages Studied: English

Submission Number: 8553

Loading