From Code Generation to Code Reasoning: A Survey of Inference-Time Methods in LLM-Based Code Generation

From Code Generation to Code Reasoning: A Survey of Inference-Time Methods in LLM-Based Code Generation

ACL ARR 2026 January Submission7568 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Code generation

Abstract: Large language models (LLMs) have rapidly advanced the state of code generation, evolving from prompt-based function synthesis to iterative, execution-guided, and agentic software engineering systems. While recent progress has led to impressive benchmark results, the growing diversity of methods and evaluation settings has also introduced fragmentation and ambiguity in how capabilities are assessed and compared. In this survey, we present a unified view of LLM-based code generation from the perspective of inference-time reasoning and interaction. We organize existing methods according to how LLMs structure generation, incorporate feedback, and interact with execution environments, covering prompt-conditioned generation, language-based self-refinement, execution-guided revision, and agentic code generation. We further review representative benchmarks across function-level, execution-grounded, repository-level, and long-horizon agentic tasks, and analyze how evaluation assumptions shape reported performance. Our analysis highlights fundamental limitations of test-based correctness metrics, risks of data contamination, and emerging challenges in evaluating iterative and agentic systems. By connecting method design choices with evaluation protocols, this survey aims to clarify current progress, expose open problems, and provide guidance for future research on reliable and scalable LLM-based code generation.

Paper Type: Long

Research Area: Code Models

Research Area Keywords: AI / LLM Agents,Code Models, Generation

Contribution Types: Surveys

Languages Studied: English

Submission Number: 7568

Loading