Fine-Tuning Discrete Diffusion Models via Reward Optimization with Applications to DNA and Protein Design
Keywords: Discrete Diffusion Models, Reward Optimization, Fine-Tuning, AI for science, Reinforcement learning
TL;DR: We propose an approach for fine-tuning discrete diffusion models to generate "natural" sequences optimized for specific tasks, addressing algorithmic and theoretical challenges of discrete domains, and showing success in DNA and protein design.
Abstract: Recent studies have demonstrated the strong empirical performance of diffusion models on discrete sequences (i.e., discrete diffusion models) across domains such as natural language and biological sequence generation. For example, in the protein inverse folding task, where the goal is to generate a protein sequence from a given backbone structure, conditional diffusion models have achieved impressive results in generating "natural" sequences that fold back into the original structure. However, practical design tasks often require not only modeling a conditional distribution but also optimizing specific task objectives. For instance, in the inverse folding task, we may prefer proteins with high stability. To address this, we consider the scenario where we have pre-trained discrete diffusion models that can generate "natural" sequences, as well as reward models that map sequences to task objectives. We then formulate the reward maximization problem within discrete diffusion models, analogous to reinforcement learning (RL), while minimizing the KL divergence against pre-trained diffusion models to preserve naturalness. To solve this RL problem, we propose a novel algorithm that enables direct backpropagation of rewards through entire trajectories generated by diffusion models, by making the originally non-differentiable trajectories differentiable using the Gumbel-Softmax trick. Our theoretical analysis indicates that our approach can generate sequences that are both "natural" (i.e., have a high probability under a pre-trained model) and yield high rewards. While similar tasks have been recently explored in diffusion models for continuous domains, our work addresses unique algorithmic and theoretical challenges specific to discrete diffusion models, which arise from their foundation in continuous-time Markov chains rather than Brownian motion. Finally, we demonstrate the effectiveness of our algorithm in generating DNA and protein sequences that optimize enhancer activity and protein stability, respectively, important tasks for gene therapies and protein-based therapeutics. The code is available at https://github.com/ChenyuWang-Monica/DRAKES.
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8496
Loading