Keywords: selective generation, dynamic code analysis, code generation
TL;DR: We propose a learning algorithm for selective code generation that controls the rate of hallucination by exploiting a code analysis method, called fuzzing, to generate unit tests for a code correctness measure.
Abstract: The hallucination of code generation models hinders their applicability to systems requiring higher safety standards. One critical bottleneck in addressing code hallucination is the difficulty of identifying the functional correctness of generated code, due to its unnatural form. We address this core bottleneck by automatically generating unit tests using dynamic code analysis tools, leveraging the \emph{executable nature} of code. Accordingly, we propose \emph{selective code generator} that abstains from uncertain generations -- based on the functional correctness evaluated by generated unit tests -- to {\color{red}theoretically control the correctness among non-abstained answers, \ie the false discovery rate}. Finally, we propose to use generated unit tests in evaluation as well as in learning for precise code evaluation, calling this paradigm \emph{FuzzEval}. We demonstrate the efficacy of our method along with the controllability of code hallucination and reasonable selection efficiency.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 14828
Loading