Keywords: offline-to-online reinforcement learning, data augmentation, diffusion models
Abstract: Offline-to-online Reinforcement Learning (O2O RL) aims to perform online fine-tuning on an offline pre-trained policy to minimize costly online interactions. Existing methods have used offline data or online data to generate new data for data augmentation, which has led to performance improvement during online fine-tuning. However, they have not fully analyzed and utilized both types of data simultaneously. Offline data helps prevent agents from settling too early on suboptimal policies by providing diverse data, while online data improves training stability and speeds up convergence. In this paper, we propose a data augmentation approach, Classifier-Free Diffusion Generation (CFDG). Considering the differences between offline data and online data, we use conditional diffusion to generate both types of data for augmentation in the online phase, aiming to improve the quality of sample generation. Experimental results show that CFDG outperforms replaying the two data types or using a standard diffusion model to generate new data. Our method is versatile and can be integrated with existing offline-to-online RL algorithms. By implementing CFDG to popular methods IQL, PEX and APL, we achieve a notable 15% average improvement in empirical performance on the D4RL benchmark like MuJoCo and AntMaze.
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8447
Loading