Proposal for a Large-scale High-quality Dataset of Activity Cliffs

Published: 24 Sept 2025, Last Modified: 15 Oct 2025NeurIPS2025-AI4Science PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: Track 2: Dataset Proposal Competition
Keywords: Structure-based Drug Discovery, Activity Cliffs, Binding Affinity Prediction, Protein-ligand Docking
Abstract: Activity cliffs---pairs of structurally similar molecules that display large differences in binding affinity---pose a fundamental challenge in structure-based drug discovery. They highlight subtle yet critical determinants of protein-ligand recognition and provide stringent test cases for computational methods. This proposal aims to establish a large-scale, high-quality dataset of activity cliffs to enable systematic study of structure-activity discontinuities and to benchmark both affinity prediction and docking approaches. We have already curated a large-scale dataset containing over 16k activity cliff pairs across 50 human protein targets from ChEMBL. Future development will focus on validating affinity data under unified experimental conditions and integrating structural annotations of representative molecular pairs. The long-term goal is to develop an open, community-driven database of activity cliffs that will accelerate method development and provide actionable insights for drug discovery.
Submission Number: 67
Loading