Keywords: Reinforcement Learning, Cross-Entropy Method, Planning, Model-Based RL
Abstract: Cross-Entropy Method (CEM) is a popular approach to planning in model-based reinforcement learning.
It has so far always taken a \textit{centralized} approach where the sampling distribution is updated \textit{centrally} based on the result of a top-$k$ operation applied to \textit{all samples}.
We show that such a \textit{centralized} approach makes CEM vulnerable to local optima and impair its sample efficiency, even in a one-dimensional multi-modal optimization task.
In this paper, we propose \textbf{Decent}ralized \textbf{CEM (DecentCEM)} where an ensemble of CEM instances run independently from one another and each performs a local improvement of its own sampling distribution.
In the exemplar optimization task, the proposed decentralized approach DecentCEM finds the global optimum much more consistently than the existing CEM approaches that use either a single Gaussian distribution or a mixture of Gaussians.
Further, we extend the decentralized approach to sequential decision-making problems where we show in 13 continuous control benchmark environments that it matches or outperforms the state-of-the-art CEM algorithms in most cases, under the same budget of the total number of samples for planning.
One-sentence Summary: It proposes a Decentralized Cross-Entropy Method that generalizes the conventional Cross-Entropy Method to work with multiple independent instances for planning
Supplementary Material: zip
15 Replies
Loading