Adjustable Quantile-Guided Diffusion Policy for Diverse Behavior Generation in Offline RL

Wenzhen Huang; Tong Li; Junge Zhang; Depeng Jin; Yong Li

Adjustable Quantile-Guided Diffusion Policy for Diverse Behavior Generation in Offline RL

Wenzhen Huang, Tong Li, Junge Zhang, Depeng Jin, Yong Li

24 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Supplementary Material: zip

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: offline reinforcement learning, diffusion

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Abstract: Offline Reinforcement Learning (RL) addresses the challenge of learning optimal policies from pre-collected data, making it a promising approach for real-world applications where online interactions with an environment are costly or impractical. We propose an offline RL method named Quantile-Guided Diffusion Policy~(qGDP), which trains a quantile network to label the training dataset and uses these labeled samples to train the diffusion model and generate new samples with the trained model according to classifier-free guidance. qGDP can adjust the preference of sample generation between imitating and improving behavioral policies by adjusting the input condition and changing the guidance scale without re-training the model, which will significantly reduce the cost of tuning the algorithm. qGDP exhibits exceptional generalization capabilities and allows easy adjustment of action generation preferences without model retraining, reducing computational costs. Experimental results on the D4RL dataset demonstrate state-of-the-art performance and computational efficiency compared to other diffusion-based methods.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 9241

Loading