Enhancing Code Generation in Large Language Models: A Human-Like Debugging Approach for Competitive Problem Solving

ACL ARR 2026 January Submission7643 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Code generation
Abstract: Recent advances in Large Language Models (LLMs) have shown impressive code generation capabilities, yet their reasoning often collapses when faced with algorithmically complex or error-sensitive programming tasks. This gap stems not from limited knowledge, but from a lack of self-diagnostic awareness—the ability to understand why a piece of code fails and how to repair it. We argue that effective code generation requires integrating cognitive processes analogous to human debugging: perceiving failures, tracing intermediate states, hypothesizing causes, and revising accordingly. Building on this perspective, we propose a Human-Like Debugging Architecture (HLDA) that operationalizes these processes within LLMs. HLDA systematically executes generated code, records fine-grained variable traces, interprets discrepancies, and iteratively regenerates improved solutions. Unlike prior self-editing or executor-based approaches that rely on outcome-level feedback, our method introduces trace-level causal feedback, enabling models to reason over “how the code went wrong” rather than merely “what went wrong.” This leads to substantial performance gains, setting new state-of-the-art results on HumanEval, MBPP, and contest-level benchmarks such as APPS and CodeContest.
Paper Type: Long
Research Area: Code Models
Research Area Keywords: Code Models, AI/LLM Agents,NLP Applications
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 7643
Loading