- Abstract: Ensemble learning, which can consistently improve the prediction performance in supervised learning, has drawn increasing attentions in reinforcement learning (RL). However, most related works focus on adopting ensemble methods in environment dynamics modeling and value function approximation, which are more essentially supervised learning tasks of the RL regime. Moreover, considering the inevitable difference between RL and supervised learning, the conclusions or theories of the existing ensemble supervised learning cannot be directly adopted to policy learning in RL. Adapting ensemble method to policy learning has not been well studied and still remains an open problem. In this work, we propose to learn the ensemble policies under the same RL objective in an end-to-end manner, in which sub-policy training and policy ensemble are combined organically and optimized simultaneously. We further theoretically prove that ensemble policy learning can improve exploration efficacy through increasing entropy of action distribution. In addition, we incorporate a regularization of diversity enhancement over the policy space which retains the ability of the ensemble policy to generalize to unseen states. The experimental results on two complex grid-world environments and one real-world application demonstrate that our proposed method achieves significantly higher sample efficiency and better policy generalization performance.