Keywords: Molecular interactions, representation learning, self-supervised learning, all-atom encoder, proteins, nucleic acids, small molecules
TL;DR: A model to universally represent molecular interactions (for proteins, nucleic acids, small molecules, and ions) at an all-atom scale.
Abstract: Molecular interactions underlie nearly all biological processes. However, most machine learning models treat molecules in isolation or specialize in a single type of interaction, which prevents generalization across biomolecular classes and limits the ability to systematically model interaction interfaces. We introduce ATOMICA, a geometric deep learning model that learns atomic-scale representations of intermolecular interfaces across diverse biomolecular modalities, including small molecules, metal ions, amino acids, and nucleic acids. ATOMICA uses a self-supervised denoising and masking objective to train on 2,037,972 interaction complexes and generate hierarchical embeddings at the levels of atoms, chemical blocks, and molecular interfaces. The model learns generalizable representations across molecular classes. We apply ATOMICA to the interfaceome and show that proteins that interact similarly with ions, small molecules, nucleic acids, lipids, and proteins tend to be involved in the same disease. We then construct five modality-specific interfaceome networks termed ATOMICANets, which connect proteins based on interaction interface similarity. These networks identify disease pathways across 27 conditions. Finally, we use ATOMICA to annotate the dark proteome—proteins lacking known structure or function—by predicting 2,646 previously uncharacterized ligand-binding sites for metal ions and cofactors.
Submission Number: 46
Loading