From Code Generation to Code Reasoning: A Survey of Inference-Time Methods in LLM-Based Code Generation

ACL ARR 2026 January Submission7568 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Code generation
Abstract: Large language models (LLMs) have rapidly advanced the state of code generation, evolving from prompt-based function synthesis to iterative, execution-guided, and agentic software engineering systems. While recent progress has led to impressive benchmark results, the growing diversity of methods and evaluation settings has also introduced fragmentation and ambiguity in how capabilities are assessed and compared. In this survey, we present a unified view of LLM-based code generation from the perspective of inference-time reasoning and interaction. We organize existing methods according to how LLMs structure generation, incorporate feedback, and interact with execution environments, covering prompt-conditioned generation, language-based self-refinement, execution-guided revision, and agentic code generation. We further review representative benchmarks across function-level, execution-grounded, repository-level, and long-horizon agentic tasks, and analyze how evaluation assumptions shape reported performance. Our analysis highlights fundamental limitations of test-based correctness metrics, risks of data contamination, and emerging challenges in evaluating iterative and agentic systems. By connecting method design choices with evaluation protocols, this survey aims to clarify current progress, expose open problems, and provide guidance for future research on reliable and scalable LLM-based code generation.
Paper Type: Long
Research Area: Code Models
Research Area Keywords: AI / LLM Agents,Code Models, Generation
Contribution Types: Surveys
Languages Studied: English
Submission Number: 7568
Loading