Keywords: Drug Discovery, Heterobifunctional Molecule, Linker Design, Dataset
TL;DR: We present HBDrug3D, the first curated 3D benchmark for heterobifunctional drug linkers, standardizing data, conformations, and evaluation to advance AI-driven PROTAC, ADC, and PDC design.
Abstract: As a vital branch of fragment-based drug design (FBDD), linker design generates molecular bridges that fuse two fragments into a complete compound and has gained widespread interest in AI-driven discovery with heterobifunctional modalities such as PROTACs, ADCs and PDCs. Linkers for these platforms must be custom-engineered to each biological mechanism, demanding geometric precision and physicochemical profiles far beyond those of conventional small-molecule linkers. However, existing PROTAC, ADC and PDC datasets remain nonstandardized, lack high-quality conformations and rely on inconsistent evaluation protocols and metrics, hampering robust model development. To address these gaps, we introduce HBDrug3D, the first benchmark dataset for heterobifunctional drug linker design. Firstly, we aggregated and stringently filtered raw data from three sources. Secondly, we harmonized storage formats and usage conventions across both chemical and engineering domains to establish a unified data-representation space. We then generated low-energy conformations with the OPLS4 force field and filtered out redundant or invalid structures. Thirdly, we leveraged our programmatic evaluation pipeline to survey diverse metrics, define HBDrug3D-specific criteria and benchmark state-of-the-art models. Results and case studies demonstrate that, while current methods can produce valid heterobifunctional linkers, substantial gains remain in overall performance and cross-modality robustness. Finally, we release an open-source codebase covering data preprocessing, model training, sampling and evaluation to lower adoption barriers and spur further research.
Primary Area: datasets and benchmarks
Submission Number: 24100
Loading