On improving experimental binding affinity predictions with synthetic data

Kevin Ryczko; Phyo Phyo Kyaw Zin; Jordan Crivelli-Decker; Ly Le; Punit K Jha; Benjamin J. Shields; Pablo Lemos; Sasaank Bandi; Maarten Van Damme; Martin Ganahl; Andrea Bortolato

On improving experimental binding affinity predictions with synthetic data

Kevin Ryczko, Phyo Phyo Kyaw Zin, Jordan Crivelli-Decker, Ly Le, Punit K Jha, Benjamin J. Shields, Pablo Lemos, Sasaank Bandi, Maarten Van Damme, Martin Ganahl, Andrea Bortolato

Published: 02 Mar 2026, Last Modified: 05 Mar 2026GEM 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: binding affinity prediction, protein-ligand interaction, molecular graphs, drug discovery, geometric deep learning

TL;DR: We add computational chemistry data to SAIR, create various splits of the data pertinent to drug discovery campaign, and compare different model approaches to predict experimental binding affinities of protein-ligand systems.

Abstract: The success of deep learning binding affinity prediction models depends critically on expanding experimental data with reliable synthetic data. We extend the Structurally Augmented IC50 Repository (SAIR) with physics-based computations and present two distinct data splits, SAIR-FEP and SAIR-OOD. With SAIR-FEP, we perform $\approx$80K absolute free energy perturbation calculations (AFEP) and curate two train/test splits to simulate realistic drug discovery scenarios. The free energy of binding and other physics-based computations are then used as either input features. We compare the performance of proteochemometric and state-of-the-art structure-based deep learning models and show that including physics-based features improves predictions, and that the quality of the structure plays a key role in their performance. For SAIR-OOD, we remove SAIR entries that overlap with complexes in public-facing benchmarks and demonstrate that simultaneous training on synthetic and experimental data improves performance on public-facing, experimental benchmarks.

Presenter: ~Kevin_Ryczko1

Format: Yes, the presenting author will attend in person if this work is accepted to the workshop.

Funding: No, the presenting author of this submission does not fall under ICLR’s funding aims, or has sufficient alternate funding.

Submission Number: 53

Loading