Keywords: code generation, constrained decoding, correctness, llm
TL;DR: We present a constrained decoding algorithm that uses context-sensitive parsing with non-extensible regular expressions to generate semantically correct programs that can be extended to runtime guarantees
Abstract: Language Models (LMs) are increasingly being used for code generation, but ensuring the correctness of generated programs remains a significant challenge. Although imperfect code may be acceptable during software development with human oversight, domains such as video games and robotics require one-shot correctness for runtime-critical components. We present a constrained decoding algorithm for generating semantically correct programs that incorporates a context-sensitive parser, which, at each step, outputs a regular expression that satisfies a critical non-extensible property to guide the generation of the next token sequence that can continue to a correct program. To build such a context-sensitive parser, we propose a framework of a dynamic tree of parsers (ToP) during parsing, where each parser corresponds to a modular context-free grammar enriched with contextual information such as variable scopes and type constraints, with tree branches representing ambiguity in the future code segment. We demonstrate our approach through sLua, a strongly typed variant of Lua, showing that our method can generate semantically correct programs conforming to any prescribed scripting API. We further show that, with careful design, our semantic guarantees extend to runtime correctness, as validated in the application of generating game mechanics for a roguelike video game.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html
Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html
Submission Number: 306
Loading