Ensemble-based Offline Reinforcement Learning with Adaptive Behavior Cloning

Danyang Wang; Lingsong Zhang

Ensemble-based Offline Reinforcement Learning with Adaptive Behavior Cloning

Danyang Wang, Lingsong Zhang

Published: 10 Oct 2024, Last Modified: 19 Nov 2024AFM 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Offline Reinforcement Learning, Behavior Cloning, Ensemble Method

TL;DR: Ensemble based model-free actor-critic algorithm with an adjustable behavior cloning (BC) term for offline RL.

Abstract: In this work, we build upon the offline reinforcement learning algorithm, TD3+BC \cite{fujimoto2021minimalist}, and propose a model-free actor-critic algorithm with an adjustable behavior cloning (BC) term. We employ an ensemble of networks to quantify the uncertainty of the estimated value function, thus addressing the issue of overestimation. Moreover, we introduce a method that is both convenient and intuitively simple for controlling the degree of BC, through a Bernoulli random variable based on the user-specified confidence level for different offline datasets. Our proposed algorithm, named Ensemble-based actor critic with Adaptive Behavior Cloning (EABC), is straightforward to implement, exhibits low variance, and achieves strong performance across all D4RL MuJoCo benchmarks.

Submission Number: 29

Loading