Impacts of Data and Models on Unsupervised Pre-training for Molecular Property Prediction

Published: 27 Oct 2023, Last Modified: 11 Dec 2023AI4Mat-2023 PosterEveryoneRevisionsBibTeX
Submission Track: Papers
Submission Category: AI-Guided Design
Keywords: Unsupervised pre-training, molecular property prediction
Abstract: The available labeled data to support molecular property prediction are limited in size due to experimental time and cost requirements. However, unsupervised learning techniques can leverage vast databases of molecular structures, thus significantly expanding the scope of training data. We compare the effectiveness of pre-training data and modeling choices to support the downstream task of molecular aqueous solubility prediction. We also compare the global and local structure of the learned latent spaces to probe the properties of effective pre-training approaches. We find that the pre-training modeling choices affect predictive performance and the latent space structure much more than the data choices.
Submission Number: 14
Loading