Discrete Diffusion Reward Guidance Methods for Offline Reinforcement Learning

Published: 20 Jun 2023, Last Modified: 11 Oct 2023SODS 2023 PosterEveryoneRevisionsBibTeX
Keywords: Offline Reinforcement Learning, Generative Modeling, Diffusion, Discrete Optimization
TL;DR: This study introduces three techniques for discrete trajectory optimization in offline reinforcement learning.
Abstract: As reinforcement learning challenges involve larger amounts of data in different forms, new techniques will be required in order to generate high-quality plans with only a compact representation of the original information. While novel diffusion generative policies have provided a way to model complex action distributions directly in the original, high-dimensional feature space, they suffer from slow inference speed and have not yet been applied with reduced dimension or to discrete tasks. In this work, we propose three diffusion-guidance techniques with a reduced representation of the state provided by quantile discretization: a gradient-based approach, a stochastic beam search approach, and a Q-learning approach. Our findings indicate that the gradient-based and beam search approaches are capable of improving scores on an offline reinforcement learning task by a significant margin.
Submission Number: 7
Loading