Bregman Centroid Guided Cross-Entropy Method

TMLR Paper5542 Authors

04 Aug 2025 (modified: 13 Aug 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: The Cross-Entropy Method (CEM) is a widely adopted trajectory optimizer in model-based reinforcement learning (MBRL), but its unimodal sampling strategy often leads to premature convergence in multimodal landscapes. In this work, we propose \textbf{$\mathcal B$regman-$\mathcal C$entroid Guided CEM ($\mathcal{BC}$-EvoCEM)}, a lightweight enhancement to ensemble CEM that leverages \emph{Bregman centroids} for principled information aggregation and diversity control. BC-EvoCEM computes a performance-weighted Bregman centroid across CEM workers and updates the least contributing ones by sampling within a trust region around the centroid. Leveraging the duality between Bregman divergences and exponential family distributions, we show that BC-EvoCEM integrates seamlessly into standard CEM pipelines with negligible overhead. Empirical results on synthetic benchmarks, a cluttered navigation task, full MBRL pipelines, and a real-world quadruped robot demonstrate that BC-EvoCEM enhances both convergence and solution quality, providing a simple yet effective upgrade for CEM.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Romain_Laroche1
Submission Number: 5542
Loading