Discriminator Based Corpus Generation for General Code Synthesis

Alexander Wild; Barry Porter

Discriminator Based Corpus Generation for General Code Synthesis

Alexander Wild, Barry Porter

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Blind SubmissionReaders: Everyone

Keywords: Code Synthesis, Neural Code Synthesis

TL;DR: A way to generate training corpora for neural code synthesis using a discriminator trained on unlabelled data

Abstract: Current work on neural code synthesis consists of increasingly sophisticated architectures being trained on highly simplified domain-specific languages, using uniform sampling across program space of those languages for training. By comparison, program space for a C-like language is vast, and extremely sparsely populated in terms of `useful' functionalities; this requires a far more intelligent approach to corpus generation for effective training. We use a genetic programming approach using an iteratively retrained discriminator to produce a population suitable as labelled training data for a neural code synthesis architecture. We demonstrate that use of a discriminator-based training corpus generator, trained using only unlabelled problem specifications in classic Programming-by-Example format, greatly improves network performance compared to current uniform sampling techniques.

Original Pdf: pdf

7 Replies

Loading