ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning

ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning

ACL ARR 2026 January Submission10264 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Model, Code Generation, Reinforcement Learning

Abstract: Large language models (LLMs) have demonstrated remarkable proficiency in code generation yet still frequently fail to produce correct solutions for complex programming tasks in a single attempt. While prior works attempt to mitigate this by incorporating external feedback such as execution results, these approaches suffer from heavy dependencies on environmental interaction and fail to cultivate the intrinsic debugging capabilities of the model. In this work, we propose ReflexiCoder-8B, a novel reinforcement learning (RL) framework that empowers models to autonomously self-reflect on their generated code and perform self-correction without reliance on external oracles. ReflexiCoder first produces an initial solution, then repeatedly reviews the previously generated code, performs bug and optimization aware reflection, and conditionally rewrites the program until no issues are found or a maximum number of rounds is reached. To strictly enforce this behavior, we formulate the process as a structured trajectory and optimize it with RL to align the model with effective self-reflection and self-correction trajectories, guided by a reward function specifically designed to value accurate error detection and successful repair. Extensive experiments on seven widely used benchmarks demonstrate that ReflexiCoder-8B achieves state-of-the-art results among open-source code models, reaching 94.51\% and 87.20\% on HumanEval and HumanEval Plus, 81.80\% and 78.57\% on MBPP and MBPP Plus, 35.00\% on BigCodeBench, 52.21\% on LiveCodeBench, and 37.34\% on CodeForces. Furthermore, our ReflexiCoder-8B is competitive with strong proprietary GPT-5.1 model on the first five benchmarks and surpasses it on complex reasoning benchmarks including LiveCodeBench and CodeForces. To facilitate future research, we release our source code at https://anonymous.4open.science/r/ReflexiCoder.

Paper Type: Long

Research Area: Code Models

Research Area Keywords: code language models, code generation

Languages Studied: English

Submission Number: 10264

Loading