Contrastive Diffuser: Planning Towards High Return States via Contrastive Learning

16 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Offline Reinforcement Learning; Diffusion Model; Contrastive Learning
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We propose CDiffuser, which devises a return contrast mechanism and enhances the trajectory generation-based diffusion RL by pulling the states in generated trajectories towards high-return states while pushing them away from low-return states.
Abstract: Applying Diffusion in reinforcement learning for long-term planning has gained much attention recently. Depending on the capability of diffusion in modeling the underlying distribution, those methods leverage the diffusion to generate the subsequent trajectories for planning, and achieve significant improvement. However, these methods neglect the differences of samples in offline datasets, in which different states have different returns. They simply leverage diffusion to learn the distribution of data, and generate the trajectories whose states have the same distribution with the offline datasets. As a result, the probability of these models reaching the high-return states is largely dependent on the distribution in the dataset. Even equipped with the guidance model, the performance is still suppressed. To address these limitations, in this paper, we propose a novel method called CDiffuser, which devises a return contrast mechanism to pull the states in generated trajectories towards high-return states while pushing them away from low-return states. Experiments on nine commonly used D4RL benchmarks demonstrate the effectiveness of our proposed method. Our code is publicly available at https://anonymous.4open.science/r/ContrastiveDiffuser.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 513
Loading