DreamFuser: Value-guided Diffusion Policy for Offline Reinforcement Learning

24 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Trajectory-based Reinforcement Learning; Diffusion Model; Offline Reinforcement Learning;
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Recent advances in reinforcement learning have underscored the potential of diffusion models, particularly in the context of policy learning. While earlier applications were predominantly focused on single-timestep settings, trajectory-based diffusion policy learning promises significant superiority, especially for low-level control tasks. In this context, we introduce DreamFuser, a trajectory-based value optimization approach that seamlessly blends the merits of diffusion-based trajectory learning and efficient Q function learning over state and noisy action. To address the computational challenges associated with action sampling of diffusion policy during the training phase, we design the DreamFuser based on the Generalized Noisy Action Markov Decision Process (GNMDP), which views the diffusion denoising process as part of the MDP transition. Empirical tests reveal DreamFuser's advantages over existing diffusion policy algorithms, notably in low-level control tasks. When benchmarked against the standard benchmark of offline reinforcement learning D4RL, DreamFuser matches or even outperforms contemporary methods. This work also elucidates the parallels between the optimization process of DreamFuser over GNMDP and Diffusion Policy over MDP, demonstrating its computational and memory advantages.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9052
Loading