Domain-Adapted Diffusion Model for PROTAC Linker Design Through the Lens of Density Ratio in Chemical Space

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We present a domain-adapted diffusion model for PROTAC linker design task.
Abstract:

Proteolysis-targeting chimeras (PROTACs) are a groundbreaking technology for targeted protein degradation, but designing effective linkers that connect two molecular fragments to form a drug-candidate PROTAC molecule remains a key challenge. While diffusion models show promise in molecular generation, current diffusion models for PROTAC linker design are typically trained on small molecule datasets, introducing distribution mismatches in the chemical space between small molecules and target PROTACs. Direct fine-tuning on limited PROTAC datasets often results in overfitting and poor generalization. In this work, we propose DAD-PROTAC, a domain-adapted diffusion model for PROTAC linker design, which addresses this distribution mismatch in chemical space through density ratio estimation to bridge the gap between small-molecule and PROTAC domains. By decomposing the target score estimator into a pre-trained score function and a lightweight score correction term, DAD-PROTAC achieves efficient fine-tuning without full retraining. Experimental results demonstrate its superior ability to generate high-quality PROTAC linkers.

Lay Summary:

PROTACs are a promising new type of drug that work by bringing a disease-causing protein together with the cell’s natural "recycling" system. This causes the unwanted protein to be destroyed. To make a PROTAC, chemists must connect two functional parts with a chemical chain, but finding the right chain (linker) is very challenging. Existing AI tools that generate molecules (called diffusion models) are usually trained on small molecules, not on complex PROTAC molecules. As a result, these models often struggle to design effective linkers for PROTACs when only limited examples are available. We introduce a new approach that bridges this gap. Instead of retraining a giant model from scratch, we adjust a pre-trained model so it learns what makes PROTAC-like molecules different.

Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Primary Area: Applications->Health / Medicine
Keywords: PROTAC, Linker Design, Diffusion Models
Submission Number: 13097
Loading