Automatic Fine-Tuned Offline-to-Online Reinforcement Learning via Increased Simple Moving Average Q-value
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Reinforcement Learning, Machine Learning
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: A novel policy regularization method for offline-to-online Reinforcement Learning.
Abstract: Offline-to-online reinforcement learning starts with pre-trained offline models and continuously learns via
interacting with the environment in online mode. The challenge of it is to adapt to distribution drift while
maintaining the quality of the learned policy simultaneously.
We propose a novel policy regularization method that aims to automatically fine-tune the model by
selectively increasing the average estimated Q-value in the sampled batches. As a result, our models maintain the
performance of the pre-trained model and improve it, unlike methods that require learning from scratch.
Furthermore, we added efficient $\mathcal{O}(1)$ complexity replay buffer techniques to adapt to distribution
drift efficiently. Our experimental results indicate that the proposed method outperforms state-of-the-art methods
on the D4RL benchmark.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4209
Loading