Offline-to-Online Reinforcement Learning with Classifier-Free Diffusion Generation

Xiao Huang; Xu Liu; Enze Zhang; Tong Yu; Shuai Li

Offline-to-Online Reinforcement Learning with Classifier-Free Diffusion Generation

Xiao Huang, Xu Liu, Enze Zhang, Tong Yu, Shuai Li

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Offline-to-online Reinforcement Learning (O2O RL) aims to perform online fine-tuning on an offline pre-trained policy to minimize costly online interactions. Existing work used offline datasets to generate data that conform to the online data distribution for data augmentation. However, generated data still exhibits a gap with the online data, limiting overall performance. To address this, we propose a new data augmentation approach, Classifier-Free Diffusion Generation (CFDG). Without introducing additional classifier training overhead, CFDG leverages classifier-free guidance diffusion to significantly enhance the generation quality of offline and online data with different distributions. Additionally, it employs a reweighting method to enable more generated data to align with the online data, enhancing performance while maintaining the agent's stability. Experimental results show that CFDG outperforms replaying the two data types or using a standard diffusion model to generate new data. Our method is versatile and can be integrated with existing offline-to-online RL algorithms. By implementing CFDG to popular methods IQL, PEX and APL, we achieve a notable 15\% average improvement in empirical performance on the D4RL benchmark such as MuJoCo and AntMaze.

Lay Summary: Teaching AI systems to learn from trial and error can be expensive or dangerous, especially in real-world settings like robotics or healthcare. One solution is to first train these systems using existing data, and then fine-tune them with minimal real-world interaction — a process called offline-to-online reinforcement learning. But there’s a challenge: the simulated data we create from past experience often doesn’t match the conditions of real-world use, which leads to worse performance. We developed a method called Classifier-Free Diffusion Generation (CFDG) to close this gap. It uses a powerful generative AI model to create more realistic training data, and it doesn’t require training any extra classifiers, which keeps things efficient. We also introduced a technique to select only the most useful generated data — the kind that best matches real-world conditions. When we applied CFDG to standard benchmarks, it boosted performance by up to 15%. Because it works well with existing methods, CFDG can help build more effective and reliable AI systems with fewer costly interactions.

Primary Area: Reinforcement Learning

Keywords: offline-to-online reinforcement learning, data augmentation, diffusion models

Submission Number: 5273

Loading