MEPG: A Minimalist Ensemble Policy Gradient Framework for Deep Reinforcement Learning

Qiang He; Huangyuan Su; Chen GONG; Xinwen Hou

MEPG: A Minimalist Ensemble Policy Gradient Framework for Deep Reinforcement Learning

Qiang He, Huangyuan Su, Chen GONG, Xinwen Hou

28 May 2022 (modified: 16 Mar 2025)DARL 2022Readers: Everyone

Keywords: Reinforcement Learning, Ensemble Learning

TL;DR: We design a novel and simple ensemble Deep RL framework that integrates multiple models into a single model to solve the heavy resource consumption issue without introducing any computational costs compared to DDPG and SAC.

Abstract: During the training of a reinforcement learning (RL) agent, the distribution of training data is non-stationary as the agent's behavior changes over time. Therefore, there is a risk that the agent is overspecialized to a particular distribution and its performance suffers in the larger picture. Ensemble RL can mitigate this issue by learning a robust policy. However, it suffers from heavy computational resource consumption due to the newly introduced value and policy functions. In this paper, to avoid the notorious resources consumption issue, we design a novel and simple ensemble deep RL framework that integrates multiple models into a single model. Specifically, we propose the Minimalist Ensemble Policy Gradient framework (MEPG), which introduces minimalist ensemble consistent Bellman update by utilizing a modified dropout operator. MEPG holds ensemble property by keeping the dropout consistency of both sides of the Bellman equation. Additionally, the dropout operator also increases MEPG's generalization capability. Moreover, we theoretically show that the policy evaluation phase in the MEPG maintains two synchronized deep Gaussian Processes. To verify the MEPG framework's ability to generalize, we perform experiments on the gym simulator, which presents that the MEPG framework outperforms or achieves a similar level of performance as the current state-of-the-art ensemble methods and model-free methods without increasing additional computational resource costs.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/mepg-a-minimalist-ensemble-policy-gradient/code)

0 Replies

Loading