Keywords: Ensemble learning, reinforcement learning, continuous control, PPO
TL;DR: A novel ensemble reinforcement learning algorithm named Cascade that uses a convex combination of its base policies.
Abstract: Though reinforcement learning has been successfully applied to a variety of domains, there
is still room left for improvement, in particular, in terms of the final performance. Ensemble
Reinforcement Learning (ERL) tries to enhance reinforcement learning techniques by using
multiple models or algorithms. We propose a novel ERL technique, called Cascade which in
the context of continuous control tasks and with PPO as the base training algorithm clearly
outperforms standard PPO in terms of the final performance. To shine light on the working
mechanisms of Cascade, we conduct ablation studies, showing how the different components of
Cascade contribute to its overall performance. Furthermore, we demonstrate that Cascade has a
robust monotonicity as the ensemble’s performance increases with each additional base agent
even when weak base agents are added in large numbers.
Submission Number: 32
Loading