Automatic Fine-Tuned Offline-to-Online Reinforcement Learning via Increased Simple Moving Average Q-value

Hsin-Yu Liu; Bharathan Balaji; Rajesh K. Gupta; Dezhi Hong

Automatic Fine-Tuned Offline-to-Online Reinforcement Learning via Increased Simple Moving Average Q-value

Hsin-Yu Liu, Bharathan Balaji, Rajesh K. Gupta, Dezhi Hong

21 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Reinforcement Learning, Machine Learning

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: A novel policy regularization method for offline-to-online Reinforcement Learning.

Abstract: Offline-to-online reinforcement learning starts with pre-trained offline models and continuously learns via interacting with the environment in online mode. The challenge of it is to adapt to distribution drift while maintaining the quality of the learned policy simultaneously. We propose a novel policy regularization method that aims to automatically fine-tune the model by selectively increasing the average estimated Q-value in the sampled batches. As a result, our models maintain the performance of the pre-trained model and improve it, unlike methods that require learning from scratch. Furthermore, we added efficient $\mathcal{O}(1)$ complexity replay buffer techniques to adapt to distribution drift efficiently. Our experimental results indicate that the proposed method outperforms state-of-the-art methods on the D4RL benchmark.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 4209

Loading