Keywords: Program generation, Source code, Program synthesis, Deep generative models
TL;DR: We give a method for generating type-safe programs in a Java-like language, given a small amount of syntactic information about the desired code.
Abstract: We study the problem of generating source code in a strongly typed, Java-like programming language, given a label (for example a set of API calls or types) carrying a small amount of information about the code that is desired. The generated programs are expected to respect a `"realistic" relationship between programs and labels, as exemplified by a corpus of labeled programs available during training. Two challenges in such *conditional program generation* are that the generated programs must satisfy a rich set of syntactic and semantic constraints, and that source code contains many low-level features that impede learning. We address these problems by training a neural generator not on code but on *program sketches*, or models of program syntax that abstract out names and operations that do not generalize across programs. During generation, we infer a posterior distribution over sketches, then concretize samples from this distribution into type-safe programs using combinatorial techniques. We implement our ideas in a system for generating API-heavy Java code, and show that it can often predict the entire body of a method given just a few API calls or data types that appear in the method.
Code: [![github](/images/github_icon.svg) capergroup/bayou](https://github.com/capergroup/bayou)