Mapping the intermolecular interaction universe through self-supervised learning on molecular crystals
Keywords: molecular modeling, intermolecular interactions, protein-ligand binding, equivariant graph neural network, self-supervised pre-training, geometric ML, molecular crystals
Abstract: Molecular interactions fundamentally influence all aspects of chemistry and biology. Prevailing machine learning approaches emphasize the modeling of molecules in isolation or at best provide limited modeling of molecular interactions, typically restricted to protein-ligand and protein-protein interactions. Here, we present how to use molecular crystals to define the MolInteractDB dataset that contains valuable biochemical knowledge, which can be captured by large self-supervised pre-trained models. MolInteractDB incorporates 344,858 molecular crystal structure entries from the Cambridge Structural Database. We formulate entries in the MolInteractDB dataset as radial patches of flexible size and at varying positions in the crystal to represent intermolecular interactions across crystal structures. We characterize a variety of interactions highlighted across 6 million patches. Leveraging MolInteractDB, we develop InteractNN, a self-supervised SE(3)-equivariant 3D message passing network. We show that InteractNN captures the latent knowledge of chemical elements as well as intermolecular interaction types at a scale not directly accessible to human scientists. To demonstrate its potential, we fine-tuned InteractNN to predict the binding affinity between proteins and ligands, producing results comparable with state-of-the-art models.
Submission Track: Original Research
Submission Number: 168
Loading