Guiding Online Reinforcement Learning with Action-Free Offline Pretraining

TMLR Paper1815 Authors

11 Nov 2023 (modified: 17 Sept 2024)Withdrawn by AuthorsEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Offline RL methods have been shown to reduce the need for environment interaction by training agents using offline collected episodes. However, the action information in offline episodes can be difficult or even impossible to collect in some practical cases. This paper investigates the problem of using action-free offline datasets to improve online reinforcement learning. We introduce Action-Free Guide (AF-Guide), a method to extract task-relevant knowledge from separate action-free offline datasets. AF-Guide employs an Action-Free Decision Transformer (AFDT) that learns from such datasets to plan the next states, given desired future returns. In turn, AFDT guides an online-learning agent trained by "Guided Soft Actor-Critic"(Guided SAC). Experiments show that AF-Guide can improve RL sample efficiency and performance. Our code is in the supplementary and will be made publicly available.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Pablo_Samuel_Castro1
Submission Number: 1815
Loading