Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
SMILES2vec: Predicting Chemical Properties from Text Representations
Garrett B. Goh, Nathan Hodas, Charles Siegel, Abhinav Vishnu
Feb 12, 2018 (modified: Feb 12, 2018)ICLR 2018 Workshop Submissionreaders: everyone
Abstract:Chemical databases store information in text representations, and the SMILES format is a universal standard used in many cheminformatics software. Encoded in each SMILES string is structural information that can be used to predict complex chemical properties. In this work, we develop SMILES2vec, a deep RNN that automatically learns features from SMILES strings to predict a broad range of chemical properties, including toxicity, activity, solubility and solvation energy. Furthermore, we trained an interpretability mask for SMILES2vec solubility prediction, which identifies specific parts of a chemical that is consistent with ground-truth knowledge with an accuracy of 88%, demonstrating that neural networks can learn technically accurate chemical concepts.
TL;DR:SMILES2vec: A RNN that reads chemical text representation to predict chemical properties and learns about real chemistry in the process.