InCoder: A Generative Model for Code Infilling and Synthesis

Daniel Fried; Armen Aghajanyan; Jessy Lin; Sida Wang; Eric Wallace; Freda Shi; Ruiqi Zhong; Scott Yih; Luke Zettlemoyer; Mike Lewis

InCoder: A Generative Model for Code Infilling and Synthesis

Daniel Fried, Armen Aghajanyan, Jessy Lin, Sida Wang, Eric Wallace, Freda Shi, Ruiqi Zhong, Scott Yih, Luke Zettlemoyer, Mike Lewis

Published: 01 Feb 2023, Last Modified: 22 Jun 2025ICLR 2023 notable top 25%Readers: Everyone

Keywords: code generation, program synthesis, language to code

TL;DR: An infilling-capable code completion model, evaluated on tasks including language-to-code, type inference, and comment generation.

Abstract: Code is seldom written in a single left-to-right pass and is instead repeatedly edited and refined. We introduce InCoder, a unified generative model that can perform program synthesis (via left-to-right generation) as well as editing (via masking and infilling). InCoder is trained to generate code files from a large corpus of permissively licensed code, where regions of code have been randomly masked and moved to the end of each file, allowing code infilling with bidirectional context. Our model is the first large generative code model that is able to infill arbitrary regions of code, which we evaluate in a zero-shot setting on challenging tasks such as type inference, comment generation, and variable re-naming. We find that the ability to condition on bidirectional context substantially improves performance on these tasks, while still performing comparably on standard program synthesis benchmarks in comparison to left-to-right only models pretrained at similar scale. Our models and code will be publicly released.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Applications (eg, speech processing, computer vision, NLP)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 3 code implementations](https://www.catalyzex.com/paper/incoder-a-generative-model-for-code-infilling/code)

12 Replies

Loading