Reward-Guided Diffusion Model for Data-Driven Black-Box Design Optimization

Hadi Keramati; Rajeev K. Jaiman

Reward-Guided Diffusion Model for Data-Driven Black-Box Design Optimization

Hadi Keramati, Rajeev K. Jaiman

Published: 06 Mar 2025, Last Modified: 15 Apr 2025ICLR 2025 DeLTa Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: long paper (up to 8 pages)

Keywords: Reward-guided Diffusion Model, Design Optimization, Metropolis–Hastings

Abstract: Black-box optimization (BBO) is an important method for design space exploration in high-dimensional domains, including fields such as materials science and robotic design. The diffusion models used for BBO either require a differentiable proxy or lack direct guidance. In this paper, we propose a reward-guided approach for training the Markov decision process (MDP) to increase the likelihood that the posterior generates higher-reward samples. We use the Metropolis–Hastings (MH) algorithm for Markov Chain Monte Carlo (MCMC) sampling to guide the reverse process. We first pre-train the diffusion model to match the distribution of the initial data, then fine-tune it so that the model acts as a policy that adapts its parameters to generate high-reward samples. This is a policy gradient method in which the policy is sampled from a pre-trained model to reduce the variance in training. Our experiments demonstrate that the reward-guided diffusion model achieves state-of-the-art performance across a variety of design problems, particularly in problems where the oracle is non-differentiable or an exact function.

Submission Number: 112

Loading