Sample Efficient Offline-to-Online Reinforcement Learning

Siyuan Guo, Lixin Zou, Hechang Chen, Bohao Qu, Haotian Chi, Philip S. Yu, Yi Chang

Published: 01 Jan 2024, Last Modified: 06 Mar 2024IEEE Trans. Knowl. Data Eng. 2024Readers: Everyone

Abstract: Offline reinforcement learning (RL) makes it possible to train the agents entirely from a previously collected dataset. However, constrained by the quality of the offline dataset, offline RL agents typically have limited performance and cannot be directly deployed. Thus, it is desirable to further finetune the pretrained offline RL agents via online interactions with the environment. Existing offline-to-online RL algorithms suffer from the low sample efficiency issue, due to two inherent challenges, i.e., exploration limitation and distribution shift. To this end, we propose a sample-efficient offline-to-online RL algorithm via Optimistic Exploration and Meta Adaptation (OEMA). Specifically, we first propose an optimistic exploration strategy according to the principle of optimism in the face of uncertainty. This allows agents to sufficiently explore the environment in a stable manner. Moreover, we propose a meta learning based adaptation method, which can reduce the distribution shift and accelerate the offline-to-online adaptation process. We empirically demonstrate that OEMA improves the sample efficiency on D4RL benchmark. Besides, we provide in-depth analyses to verify the effectiveness of both optimistic exploration and meta adaptation.

0 Replies