Metropolis-Hastings Discrete Diffusion: Reward-Guided Sampling by Exploring the Clean Data Manifold

Prin Phunyaphibarn; Minhyuk Sung

Metropolis-Hastings Discrete Diffusion: Reward-Guided Sampling by Exploring the Clean Data Manifold

Prin Phunyaphibarn, Minhyuk Sung

09 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Discrete Diffusion, Generative AI, Biological Sequence Design

Abstract: Discrete diffusion models have recently emerged as a powerful class of generative models for discrete data, showing effectiveness across diverse scientific domains, such as chemistry and biology. In these fields, the notion of data quality is often well defined, for example drug-likeness in molecules, which makes reward-based guidance at inference time crucial. While reward guidance has been extensively studied for continuous diffusion models, existing approaches are either inapplicable to discrete diffusion due to their reliance on reward gradients, or ineffective because they lack local search. Some methods based on intermediate rewards are applicable to discrete diffusion but tend to underperform, since intermediate rewards are noisy due to the non-smooth nature of reward functions used in scientific domains. To address this, we propose Metropolis-Hastings Discrete Diffusion (MHDD), a method that performs effective test-time reward-guided sampling for discrete diffusion models, enabling local search without relying on intermediate rewards. The key idea is to construct a Markov chain of clean samples with the target distribution as its stationary distribution. We achieve this using the Metropolis–Hastings algorithm. However, directly applying it to discrete diffusion is infeasible due to the intractable acceptance probability. To address this, we design the proposal distribution by sequentially applying the forward and backward processes, which makes the acceptance probability tractable. Experiments on molecule and biological sequence generation with four different reward functions demonstrate that our method consistently outperforms prior approaches that rely on intermediate rewards.

Primary Area: generative models

Submission Number: 3417

Loading