Ensuring Functional Correctness of Large Code Models with Selective Generation

Ensuring Functional Correctness of Large Code Models with Selective Generation

ICLR 2026 Conference Submission14828 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: selective generation, dynamic code analysis, code generation

TL;DR: We propose a learning algorithm for selective code generation that controls the rate of hallucination by exploiting a code analysis method, called fuzzing, to generate unit tests for a code correctness measure.

Abstract: The hallucination of code generation models hinders their applicability to systems requiring higher safety standards. One critical bottleneck in addressing code hallucination is the difficulty of identifying the functional correctness of generated code, due to its unnatural form. We address this core bottleneck by automatically generating unit tests using dynamic code analysis tools, leveraging the \emph{executable nature} of code. Accordingly, we propose \emph{selective code generator} that abstains from uncertain generations -- based on the functional correctness evaluated by generated unit tests -- to {\color{red}theoretically control the correctness among non-abstained answers, \ie the false discovery rate}. Finally, we propose to use generated unit tests in evaluation as well as in learning for precise code evaluation, calling this paradigm \emph{FuzzEval}. We demonstrate the efficacy of our method along with the controllability of code hallucination and reasonable selection efficiency.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 14828

Loading