Keywords: discrete diffusion, fine-tuning, reinforcement learning, multi-objective optimization, AI for science, reward optimization, biological sequence design
TL;DR: We introduce TR2-D2, a framework that uses tree search to optimize trajectories of discrete diffusion models to construct replay buffers for trajectory-aware fine-tuning under single- or multi-objective rewards.
Abstract: Reinforcement learning with stochastic optimal control offers a promising framework for diffusion fine-tuning, where a pre-trained diffusion model is optimized to generate paths that lead to a reward-tilted distribution. While these approaches enable optimization without access to explicit samples from the optimal distribution, they require training on rollouts under the current fine-tuned model, making them susceptible to reinforcing sub-optimal trajectories that yield poor rewards. To overcome this challenge, we introduce **TR**ee-Search Guided **TR**ajectory-Aware Fine-Tuning for **D**iscrete **D**iffusion (**TR2-D2**), a novel framework that optimizes reward-guided discrete diffusion trajectories with tree search to construct replay buffers for trajectory-aware fine-tuning. These buffers are generated using Monte Carlo Tree Search (MCTS) and subsequently used to fine-tune a pre-trained discrete diffusion model under a stochastic optimal control objective. We validate our framework on single- and multi-objective fine-tuning of biological sequence diffusion models, highlighting the overall effectiveness of TR2-D2 for reliable reward-guided fine-tuning in discrete sequence generation.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 10061
Loading