Pre-training on noncovalent interactions from synthetic protein-ligand structures to better predict binding affinity
Track: Track 1: Original Research/Position/Education/Attention Track
TL;DR: Pre-training on noncovalent interactions in protein-ligand complexes yields rich representations of their interaction profiles that can boost binding affinity prediction performance.
Abstract: Accurate prediction of protein-ligand binding
affinity is a central challenge in computational
drug discovery, yet existing graph neural network
approaches typically represent protein-ligand
complexes as homogeneous atom-level graphs,
neglecting the role of aromatic ring systems and
the rich hierarchy of noncovalent interactions
that govern binding. In this work, we introduce
the Protein-Ligand Interaction Pre-training
or PLIP approach, a heterogeneous equivariant
graph transformer that explicitly encodes four
node types - ligand atoms, protein atoms, ligand
rings, and protein rings connected by interaction-
specific edge relations such as hydrogen bonds,
hydrophobic contacts, π-π stacking, and cation-π
interactions. We pretrain on interactions found
in 5.1 million synthetic protein-ligand structures
from the Structural and Interaction Repository
(SAIR) using a multi-task self-supervised objective comprising interaction-type classification, interatomic distance regression, and binding affinity prediction. Systematic evaluation on four
drug targets - acetylcholinesterase (AChE), SARS-
CoV-2 main protease (SARS-Mpro), Zika protease, and μ-opioid receptor - demonstrates that
multi-task pretraining on all three pretraining objectives (interaction type, distance and affinity)
achieves Pearson correlations of 0.667 on AChE
(+26% over training from scratch), 0.490 on
Zika (+6%), 0.374 on μ-opioid receptor (+32%),
and 0.255 on SARS-Mpro (+39%). Comparisons
against competing baselines demonstrate the benefits of pre-training on protein-ligand interactions
for structure-based binding affinity prediction.
Keywords: pretraining, binding affinity, noncovalent, synthetic, protein-ligand
Submission Number: 35
Loading