H2O+: An Improved Framework for Hybrid Offline-and-Online RL with Dynamics Gaps

H2O+: An Improved Framework for Hybrid Offline-and-Online RL with Dynamics Gaps

ICLR 2024 Workshop DMLR Submission11 Authors

Published: 04 Mar 2024, Last Modified: 02 May 2024DMLR @ ICLR 2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning, Cross-Domain Policy Transfer, Sim-to-Real, Hybrid Offline-and-Online RL

TL;DR: H2O+ offers great flexibility to bridge various choices of offline and online learning methods while also accounting for dynamics gaps between the real and simulation environments.

Abstract: Solving real-world complex tasks using reinforcement learning (RL) without high-fidelity simulation environments or large amounts of offline data can be quite challenging. Online RL agents trained in imperfect simulation environments can suffer from severe sim-to-real issues. Offline RL approaches although bypass the need for simulators, often pose demanding requirements on the size and quality of the offline datasets. The recently emerged hybrid offline-and-online RL provides an attractive framework that enables joint use of limited offline data and imperfect simulator for transferable policy learning. In this paper, we develop a new algorithm, called H2O+, which offers great flexibility to bridge various choices of offline and online learning methods, while also accounting for dynamics gaps between the real and simulation environment. Through extensive simulation and real-world robotics experiments, we demonstrate superior performance and flexibility over advanced cross-domain online and offline RL algorithms.

Primary Subject Area: Other

Paper Type: Research paper: up to 8 pages

DMLR For Good Track: Participate in DMLR for Good Track

Participation Mode: In-person

Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.

Submission Number: 11

Loading