Guided Decoupled Exploration for Offline Reinforcement Learning Fine-tuning

19 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Exploration, Reinforcement Leraning, Offline RL, Fine-Tuning
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: Fine-tuning offline RL agents with online interactions
Abstract: Fine-tuning pre-trained offline Reinforcement Learning (RL) agents with online interactions is a promising strategy to improve the sample efficiency. In this work, we study the problem of sample-efficient fine-tuning for offline RL agents. We first discussed three challenges related to the over-concentration on the offline dataset, *i.e.,* inefficient exploration, distributional shifted samples, and distorted value functions. We focused on the exploration issue and investigated an important open question of how to explore more efficiently in offline RL fine-tuning. Through detailed experiments, we found that it is important to relax the conservative constraints to encourage exploration while avoiding reckless actions which could ruin the learned policy. To this end, we introduced the Guided Decoupled Exploration (GDE) for fine-tuning offline RL agents, where we decouple the exploration and exploitation policies and use a dynamic teacher policy to guide exploration. Experiments on the D4RL benchmark tasks showcase the effectiveness of the proposed method.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2067
Loading