Automatic Fine-Tuned Offline-to-Online Reinforcement Learning via Increased Simple Moving Average Q-value

21 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Reinforcement Learning, Machine Learning
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: A novel policy regularization method for offline-to-online Reinforcement Learning.
Abstract: Offline-to-online reinforcement learning starts with pre-trained offline models and continuously learns via interacting with the environment in online mode. The challenge of it is to adapt to distribution drift while maintaining the quality of the learned policy simultaneously. We propose a novel policy regularization method that aims to automatically fine-tune the model by selectively increasing the average estimated Q-value in the sampled batches. As a result, our models maintain the performance of the pre-trained model and improve it, unlike methods that require learning from scratch. Furthermore, we added efficient $\mathcal{O}(1)$ complexity replay buffer techniques to adapt to distribution drift efficiently. Our experimental results indicate that the proposed method outperforms state-of-the-art methods on the D4RL benchmark.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4209
Loading