Bayesian Offline-to-Online Reinforcement Learning : A Realist Approach

21 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: offline-to-online RL, deep reinforcement learning, Bayesian RL
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We propose a Bayesian approach for offline-to-online RL and show its advantage both theoretically and empirically.
Abstract: Offline reinforcement learning (RL) is crucial for real-world applications where exploration can be costly. However, offline learned policies are often suboptimal and require online finetuning. In this paper, we tackle the fundamental dilemma of offline-to-online finetuning: if the agent remains pessimistic, it may fail to learn a better policy, while if it becomes optimistic directly, performance may suffer from a sudden drop. We show theoretically that the agent should adopt neither optimistic nor pessimistic policies during the offline-to-online transition. Instead, we propose a Bayesian approach, where the agent acts by sampling from its posterior and updates its belief accordingly. We demonstrate that such an agent can avoid a sudden performance drop while still being guaranteed to find the optimal policy. Based on our theoretical findings, we introduce a novel algorithm that outperforms existing benchmarks in our experiments, demonstrating the efficacy of our approach. Overall, the proposed approach provides a new perspective on offline-to-online finetuning that has the potential to enable more effective learning from offline data.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3172
Loading