Pre-training on noncovalent interactions from synthetic protein-ligand structures to better predict binding affinity

Published: 28 May 2026, Last Modified: 28 May 2026ICML 2026 FM4LS Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: pretraining, binding affinity, noncovalent, synthetic, protein-ligand
TL;DR: Pre-training on noncovalent interactions in protein-ligand complexes yields rich representations of their interaction profiles that can boost binding affinity prediction performance.
Abstract: Accurate prediction of protein-ligand binding affinity is a central challenge in computational drug discovery, yet existing graph neural network approaches typically represent protein-ligand complexes as homogeneous atom-level graphs, neglecting the role of aromatic ring systems and the rich hierarchy of noncovalent interactions that govern binding. In this work, we introduce the Protein-Ligand Interaction Pre-training or PLIP approach, a heterogeneous equivariant graph transformer that explicitly encodes four node types - ligand atoms, protein atoms, ligand rings, and protein rings connected by interaction- specific edge relations such as hydrogen bonds, hydrophobic contacts, π-π stacking, and cation-π interactions. We pretrain on interactions found in 5.1 million synthetic protein-ligand structures from the Structural and Interaction Repository (SAIR) using a multi-task self-supervised objective comprising interaction-type classification, interatomic distance regression, and binding affinity prediction. Systematic evaluation on four drug targets - acetylcholinesterase (AChE), SARS- CoV-2 main protease (SARS-Mpro), Zika protease, and μ-opioid receptor - demonstrates that multi-task pretraining on all three pretraining objectives (interaction type, distance and affinity) achieves Pearson correlations of 0.667 on AChE (+26% over training from scratch), 0.490 on Zika (+6%), 0.374 on μ-opioid receptor (+32%), and 0.255 on SARS-Mpro (+39%). Comparisons against competing baselines demonstrate the benefits of pre-training on protein-ligand interactions for structure-based binding affinity prediction.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 30
Loading