Structural Language Models for Any-Code Generation

Sep 25, 2019 Blind Submission readers: everyone Show Bibtex
  • Keywords: Program Generation, Structural Language Model, SLM, Generative Model, Code Generation
  • TL;DR: We generate source code using a Structural Language Model over the program's Abstract Syntax Tree
  • Abstract: We address the problem of Any-Code Generation (AnyGen) - generating code without any restriction on the vocabulary or structure. The state-of-the-art in this problem is the sequence-to-sequence (seq2seq) approach, which treats code as a sequence and does not leverage any structural information. We introduce a new approach to AnyGen that leverages the strict syntax of programming languages to model a code snippet as tree structural language modeling (SLM). SLM estimates the probability of the program's abstract syntax tree (AST) by decomposing it into a product of conditional probabilities over its nodes. We present a neural model that computes these conditional probabilities by considering all AST paths leading to a target node. Unlike previous structural techniques that have severely restricted the kinds of expressions that can be generated, our approach can generate arbitrary expressions in any programming language. Our model significantly outperforms both seq2seq and a variety of existing structured approaches in generating Java and C# code. We make our code, datasets, and models available online.
  • Original Pdf:  pdf
0 Replies