PRISM: A Hybrid Diffusion-Reinforcement Learning Framework for 3D Structure-based De Novo Design

Published: 02 Mar 2026, Last Modified: 15 Apr 2026GEM 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Structure-based drug design, molecular diffusion models, reinforcement learning, 3D molecular generation, de novo drug design, multi-objective optimization, generative models for chemistry
TL;DR: Reinforcement learning fine-tunes pocket-aware 3D diffusion models to generate molecules that satisfy user-defined designed rewards.
Abstract: Structure-based diffusion models offer a promising route for de novo 3D ligand generation directly within protein binding sites, but generating stereochemically valid molecules within acceptable molecular property constraints required for drug design remains challenging. Existing approaches typically rely on inference-time guidance, limiting flexibility and preventing medicinal chemists from directly specifying design objectives. We introduce PRISM: Pocket Reinforced Iterative Structure-based Molecular diffusion, a reinforcement learning framework for fine-tuning structure-based diffusion models using Proximal Policy Optimization (PPO). PRISM enables user-defined rewards to be incorporated directly into the generative process. We evaluate PRISM across six well-studied drug targets and systematically study single-objective, multi-objective, and curriculum-based optimization strategies. PRISM consistently improves 3D geometric validity, demonstrating that reinforcement learning can effectively shape diffusion models in continuous coordinate space. Extending to multi-objective rewards highlights how reward design and reward density influence optimization, while a staged curriculum anchored in geometric validity stabilizes training and supports the integration of more complex medicinal chemistry objectives. PRISM is lightweight and practical, requiring only a single GPU and a few hours of training to fine-tune a model toward a desired reward. Together, these results establish reinforcement learning as a flexible and accessible tool for optimising structure-based molecular diffusion, enabling rapid experimentation with custom reward functions for 3D molecular design.
Submission Number: 95
Loading