Keywords: bandit optimization, diffusion models, posterior sampling, algorithmic discovery
Abstract: Many real-world discovery problems, including drug and material design, can be modeled within the bandit optimization framework, where an agent selects a sequence of experiments to efficiently optimize an unknown reward function. However, classic bandit algorithms operate on fixed finite or continuous action sets, making discovering novel designs impossible in the former case, and often leading to the curse of dimensionality in the latter, thus rendering these methods impractical. In this work, we first formalize the *generative bandit* setting, where an agent wishes to maximize an unknown reward function over the support of a data distribution, often called *data manifold*, which implicitly encodes complex constraints (e.g., the geometry of valid molecules), and from which (unlabeled) sample data is available (e.g., a dataset of valid molecules). We then propose Diffusion Posterior Sampling (DiffPS), an algorithm that tackles the exploration-exploitation problem directly on the learned data manifold by leveraging a conditional diffusion model. We formally show that the statistical complexity of DiffPS adapts to the *intrinsic dimensionality* of the data, overcoming the curse of dimensionality in high-dimensional settings. Our experimental evaluation supports the theoretical claims and demonstrates promising performance in practice.
Supplementary Material: zip
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7192
Loading