Reward-Focused Fine-tuning of Pocket-aware Diffusion Models via Direct Preference Optimization

ICLR 2026 Conference Submission13431 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Structure Based Drug Design, Diffusion, Direct Preference Optimization, Feedback
Abstract: Diffusion models have recently been found promising for Structure-based drug design (SBDD). Yet, how to effectively fine-tune the models for generating more desirable drug-like molecules given the relatively scarce pocket-ligand data remains challenging. With the recent success of aligning diffusion models with preference data, we introduce Reward-Focused Fine-Tuning (RFFT) which is a novel framework for fine-tuning pretrained pocket-aware diffusion models using direct preference optimization (DPO). Using a reward score and self-generated ligand pairs from the pretrained model, RFFT constructs data with winner-loser pairs as feedback and fine-tunes the model with DPO accordingly. The process can be repeated iteratively to gain continuous improvement. To illustrate its effectiveness, we apply RFFT to fine-tune a diffusion model TargetDiff recently proposed for SBDD. Our empirical results demonstrate that TargetDiff-RFFT after fine-tuning can gain substantial improvement on generation quality. Also, its performance is highly competitive to the existing state-of-the-art baselines, being first place in Chemical Property analysis and second place in Binding Affinity analysis. Surprisingly, our substructure analysis results show that RFFT not only preserves but actually enhances the model's fidelity to real data distributions.
Primary Area: generative models
Submission Number: 13431
Loading