Revolutionize drug discovery with dense PPI data

Published: 24 Sept 2025, Last Modified: 15 Oct 2025NeurIPS2025-AI4Science PosterEveryoneRevisionsBibTeXCC BY 4.0
Additional Submission Instructions: For the camera-ready version, please include the author names and affiliations, funding disclosures, and acknowledgements.
Track: Track 2: Dataset Proposal Competition
Keywords: Protein–protein interactions, Foundation models, Affinity prediction, Therapeutic, Drug discovery, Antibody
TL;DR: We propose dense PPI datasets to train interaction-aware foundation models. Unlike sparse data, dense sampling captures mutational effects, enabling accurate affinity prediction with broad impact on drug discovery, diagnostics, and synthetic biology
Abstract: Drug development faces persistent tradeoffs between efficacy, safety, and developability, but existing foundation models cannot reliably predict binding affinity—the central challenge for therapeutic design. This limitation stems from sparse protein–protein interaction (PPI) datasets, which largely reflect natural protein pairs and encourage memorization rather than generalization. We propose dense PPI datasets that systematically sample mutational neighborhoods, compelling models to learn transferable interaction principles. Using scalable FACS and sequencing, billions of labeled data points can be generated at reasonable cost. These datasets would enable PPI-specific foundation models with accurate affinity prediction, improved structure modeling, and efficient exploration of interaction-aware sequence landscapes, with transformative impact on drug discovery, diagnostics, synthetic biology, and the broader life sciences.
Submission Number: 255
Loading