Probabilistic Mixture-of-Experts for Efficient Deep Reinforcement Learning

Jie Ren; Yewen Li; Zihan Ding; Wei Pan; Hao Dong

Probabilistic Mixture-of-Experts for Efficient Deep Reinforcement Learning

Jie Ren, Yewen Li, Zihan Ding, Wei Pan, Hao Dong

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: Deep Reinforcement Learning, Sample Efficiency, Gaussian Mixture Models, Mixture-of-Experts

Abstract: Deep reinforcement learning (DRL) has successfully solved various problems recently, typically with a unimodal policy representation. However, grasping the decomposable and hierarchical structures within a complex task can be essential for further improving its learning efficiency and performance, which may lead to a multimodal policy or a mixture-of-experts (MOE). To our best knowledge, present DRL algorithms for general utility do not deploy MOE methods as policy function approximators due to the lack of differentiability, or without explicit probabilistic representation. In this work, we propose a differentiable probabilistic mixture-of-experts (PMOE) embedded in the end-to-end training scheme for generic off-policy and on-policy algorithms using stochastic policies, e.g., Soft Actor-Critic (SAC) and Proximal Policy Optimisation (PPO). Experimental results testify the advantageous performance of our method over unimodal polices and three different MOE methods, as well as a method of option frameworks, based on two types of DRL algorithms. We also demonstrate the distinguishable primitives learned with PMOE in different environments.

One-sentence Summary: An end-to-end differentiable probabilistic mixture-of-experts for improving the exploration and sample efficiency in DRL.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Supplementary Material: zip

Reviewed Version (pdf): https://openreview.net/references/pdf?id=DpPUvrXiNQ

9 Replies

Loading