CA-DEL: Construction and Application of an Intelligent DEL Database for Anti-Cancer Drug Discovery

11 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: DNA-Encoded Library (DEL), Drug Discovery, Anti-Cancer, Database, Cheminformatics
Abstract: The success of machine learning in drug discovery hinges on learning the relationship between a chemical structure and its biological activity. While DNA-Encoded Library (DEL) technology can generate the massive datasets required for this task, its primary signal---sequencing read counts---is an indirect and often noisy proxy for true molecular binding affinity. To address the scarcity of public benchmarks for developing robust models that can overcome this data challenge, we introduce CA-DEL. CA-DEL is a multi-dimensional public benchmark featuring screens against three homologous carbonic anhydrase isoforms. Notably, it is the first public DEL dataset to integrate both 2D chemical structures and pre-computed 3D protein-ligand conformations. Crucially, CA-DEL includes a validation set with experimentally determined binding affinities ($K_i$ values). This unique feature enables the direct evaluation of a model's ability to predict true biological activity, rather than simply modeling the noisy enrichment signal.
Primary Area: datasets and benchmarks
Submission Number: 3981
Loading