SMILES2vec: Predicting Chemical Properties from Text RepresentationsDownload PDF

12 Feb 2018 (modified: 05 May 2023)ICLR 2018 Workshop SubmissionReaders: Everyone
Abstract: Chemical databases store information in text representations, and the SMILES format is a universal standard used in many cheminformatics software. Encoded in each SMILES string is structural information that can be used to predict complex chemical properties. In this work, we develop SMILES2vec, a deep RNN that automatically learns features from SMILES strings to predict a broad range of chemical properties, including toxicity, activity, solubility and solvation energy. Furthermore, we trained an interpretability mask for SMILES2vec solubility prediction, which identifies specific parts of a chemical that is consistent with ground-truth knowledge with an accuracy of 88%, demonstrating that neural networks can learn technically accurate chemical concepts.
TL;DR: SMILES2vec: A RNN that reads chemical text representation to predict chemical properties and learns about real chemistry in the process.
Keywords: Deep Neural Network, Recurrent Neural Network, Natural Language Processing, Cheminformatics, Chemistry
4 Replies

Loading