Keywords: Offline Reinforcement Learning, Behavior Cloning, Ensemble Method
TL;DR: Ensemble based model-free actor-critic algorithm with an adjustable behavior cloning (BC) term for offline RL.
Abstract: In this work, we build upon the offline reinforcement learning algorithm, TD3+BC \cite{fujimoto2021minimalist}, and propose a model-free actor-critic algorithm with an adjustable behavior cloning (BC) term. We employ an ensemble of networks to quantify the uncertainty of the estimated value function, thus addressing the issue of overestimation. Moreover, we introduce a method that is both convenient and intuitively simple for controlling the degree of BC, through a Bernoulli random variable based on the user-specified confidence level for different offline datasets. Our proposed algorithm, named Ensemble-based actor critic with Adaptive Behavior Cloning (EABC), is straightforward to implement, exhibits low variance, and achieves strong performance across all D4RL MuJoCo benchmarks.
Submission Number: 29
Loading