Enhancing Offline-to-Online Reinforcement Learning by Adaptive Experience Aligned Diffusion Sampling

ICLR 2026 Conference Submission13994 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Offline-to-online RL; Data Augmentation;Diffusion Model
TL;DR: We introduce Adaptive Data Aligned Diffusion Sampling (AD2S), attempting to accelerate the offline-to-online RL fine-tuning from a perspective of data generation.
Abstract: Pretraining models on diverse prior data and fine-tuning them on domain-specific tasks is an efficient training paradigm to obtain promising performance on scenarios with limited data or interaction. In the context of reinforcement learning (RL), such a paradigm is named offline-to-online (O2O) RL, where the pretrained agent needs to revise and improve the offline pretrained policy based on its own experience in the online environment. Although prior works in the literature have proven the efficiency of fine-tuning the offline-pretrained agent without offline data, they often require additional designs to overcome the unstable online fine-tuning induced by the discrepancy between the offline and online data. Moreover, existing works demonstrate that introducing offline data when training an online agent from scratch is sample-efficient. Therefore, reusing the knowledge from the offline data properly should be favorable to O2O RL. In this paper, we introduce Adaptive Data Aligned Diffusion \textbf{S}ampling (AD2S), attempting to accelerate the O2O RL fine-tuning from a perspective of data generation. Our method comprises three key components: distance-based experience alignment, curiosity-driven data prioritization, and data regeneration with amplified guidance. AD2S is a plug-in approach and can be combined with existing methods in the offline-to-online RL setting. By implementing AD2S to off-the-shelf methods, Cal-QL, empirical results indicate improvement in commonly studied datasets.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 13994
Loading