Decentralized Cross-Entropy Method for Model-Based Reinforcement Learning

Zichen Zhang; Jun Jin; Martin Jagersand; Jun Luo; Dale Schuurmans

Decentralized Cross-Entropy Method for Model-Based Reinforcement Learning

Zichen Zhang, Jun Jin, Martin Jagersand, Jun Luo, Dale Schuurmans

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone

Keywords: Reinforcement Learning, Cross-Entropy Method, Planning, Model-Based RL

Abstract: Cross-Entropy Method (CEM) is a popular approach to planning in model-based reinforcement learning. It has so far always taken a \textit{centralized} approach where the sampling distribution is updated \textit{centrally} based on the result of a top-$k$ operation applied to \textit{all samples}. We show that such a \textit{centralized} approach makes CEM vulnerable to local optima and impair its sample efficiency, even in a one-dimensional multi-modal optimization task. In this paper, we propose \textbf{Decent}ralized \textbf{CEM (DecentCEM)} where an ensemble of CEM instances run independently from one another and each performs a local improvement of its own sampling distribution. In the exemplar optimization task, the proposed decentralized approach DecentCEM finds the global optimum much more consistently than the existing CEM approaches that use either a single Gaussian distribution or a mixture of Gaussians. Further, we extend the decentralized approach to sequential decision-making problems where we show in 13 continuous control benchmark environments that it matches or outperforms the state-of-the-art CEM algorithms in most cases, under the same budget of the total number of samples for planning.

One-sentence Summary: It proposes a Decentralized Cross-Entropy Method that generalizes the conventional Cross-Entropy Method to work with multiple independent instances for planning

Supplementary Material: zip

15 Replies

Loading