All SMILES Variational Autoencoder for Molecular Property Prediction and Optimization

Zaccary Alperstein; Artem Cherkasov; Jason Rolfe

All SMILES Variational Autoencoder for Molecular Property Prediction and Optimization

Zaccary Alperstein, Artem Cherkasov, Jason Rolfe

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Blind SubmissionReaders: Everyone

Keywords: generative modelling, variational autoencoder, chemistry, cheminformatics, chemoinformatics, molecular property optimization

TL;DR: We pool messages amongst multiple SMILES strings of the same molecule to pass information along all paths through the molecular graph, producing latent representations that significantly surpass the state-of-the-art in a variety of tasks.

Abstract: Variational autoencoders (VAEs) defined over SMILES string and graph-based representations of molecules promise to improve the optimization of molecular properties, thereby revolutionizing the pharmaceuticals and materials industries. However, these VAEs are hindered by the non-unique nature of SMILES strings and the computational cost of graph convolutions. To efficiently pass messages along all paths through the molecular graph, we encode multiple SMILES strings of a single molecule using a set of stacked recurrent neural networks, harmonizing hidden representations of each atom between SMILES representations, and use attentional pooling to build a final fixed-length latent representation. By then decoding to a disjoint set of SMILES strings of the molecule, our All SMILES VAE learns an almost bijective mapping between molecules and latent representations near the high-probability-mass subspace of the prior. Our SMILES-derived but molecule-based latent representations significantly surpass the state-of-the-art in a variety of fully- and semi-supervised property regression and molecular property optimization tasks.

Original Pdf: pdf

14 Replies

Loading