Repurposing Density Functional Theory to Suit Deep Learning

Published: 28 Jul 2023, Last Modified: 28 Jul 2023SynS & ML @ ICML2023EveryoneRevisionsBibTeX
Keywords: dataset creation quantum chemistry density functional theory neural networks
TL;DR: People train NNs on quantum chemistry datasets, we introduce a library to generate larger datasets
Abstract: Density Functional Theory (DFT) accurately predicts the properties of molecules given their atom types and positions, and often serves as ground truth for molecular property prediction tasks. Neural Networks (NN) are popular tools for such tasks and are trained on DFT datasets, with the aim to approximate DFT at a fraction of the computational cost. Research in other areas of machine learning has shown that generalisation performance of NNs tends to improve with increased dataset size, however, the computational cost of DFT limits the size of DFT datasets. We present PySCFIPU, a DFT library that allows us to iterate on both dataset generation and NN training. We create QM10X, a dataset with 100M conformers, in 13 hours, on which we subsequently train SchNet in 12 hours. We show that the predictions of SchNet improve solely by increasing training data without incorporating further inductive biases.
Submission Number: 17
Loading