Track: long paper (up to 9 pages)
Keywords: llm, constrained decoding, type safety, type system, code synthesis, code generation
TL;DR: We guarantee type-safety of LLM-generated code using a novel constrained decoding technique.
Abstract: Large Language Models (LLMs) have achieved notable success in code generation.
However, they still frequently produce invalid code, as they do not precisely model formal aspects of programming languages.
Constrained decoding is a promising approach to alleviate this issue and has been successfully applied to domain-specific languages and syntactic features, but is not able to enforce more semantic features, such as well-typedness.
To address this issue, we introduce *type-aware constrained decoding*.
We develop a novel prefix automata formalism and introduce a sound approach to guarantee existence of a type-safe completion of a partial program based on type inference and a search over inhabitable types.
We implement type-aware constraining first for a foundational simply-typed language, then extend it to TypeScript.
In our evaluation across state-of-the-art open-weight LLMs of up to 34B parameters and various model families, type-aware constraining reduces compilation errors by on average $70.9$% and increases functional correctness by $16.2$% in code synthesis, translation, and repair tasks.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Presenter: ~Niels_Mündler1
Format: Yes, the presenting author will attend in person if this work is accepted to the workshop.
Funding: No, the presenting author of this submission does *not* fall under ICLR’s funding aims, or has sufficient alternate funding.
Submission Number: 25
Loading