Syntax-Directed Variational Autoencoder for Structured Data


Nov 07, 2017 (modified: Dec 07, 2017) ICLR 2018 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: Deep generative models have been enjoying success in modeling continuous data. However it remains challenging to capture the representations for discrete structures with formal grammars and semantics, \eg, computer programs and molecular structures. How to generate both syntactically and semantically correct data still remains largely an open problem. Inspired by the theory of compiler where syntax and semantics check is done via syntax-directed translation (SDT), we propose a novel syntax-directed variational autoencoder (SD-VAE) by introducing \emph{stochastic lazy attributes}. This approach converts the offline SDT check into on-the-fly generated guidance for constraining the decoder. Comparing to the state-of-the-art methods, our approach enforces constraints on the output space so that the output will be not only \emph{syntactically} valid, but also \emph{semantically} reasonable. We evaluate the proposed model with applications in programming language and molecules, including reconstruction and program/molecule optimization. The results demonstrate the effectiveness in incorporating syntactic and semantic constraints in discrete generative models, which is significantly better than current state-of-the-art approaches.
  • TL;DR: A new generative model for discrete structured data. The proposed stochastic lazy attribute converts the offline semantic check into online guidance for stochastic decoding, which effectively addresses the constraints in syntax and semantics, and also achieves superior performance
  • Keywords: generative model for structured data, syntax-directed generation, molecule and program optimization, variational autoencoder