Bregman Centroid Guided Cross-Entropy Method

TMLR Paper5542 Authors

04 Aug 2025 (modified: 11 Oct 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: The Cross-Entropy Method (CEM) is a widely adopted trajectory optimizer in model-based reinforcement learning (MBRL), but its unimodal sampling strategy often leads to premature convergence in multimodal landscapes. In this work, we propose \textbf{$\mathcal B$regman-$\mathcal C$entroid Guided CEM ($\mathcal{BC}$-EvoCEM)}, a lightweight enhancement to ensemble CEM that leverages \emph{Bregman centroids} for principled information aggregation and diversity control. BC-EvoCEM computes a performance-weighted Bregman centroid across CEM workers and updates the least contributing ones by sampling within a trust region around the centroid. Leveraging the duality between Bregman divergences and exponential family distributions, we show that BC-EvoCEM integrates seamlessly into standard CEM pipelines with negligible overhead. Empirical results on synthetic benchmarks, a cluttered navigation task, full MBRL pipelines, and a real-world quadruped robot demonstrate that BC-EvoCEM enhances both convergence and solution quality, providing a simple yet effective upgrade for CEM.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: ### Notes for the reviewers Due to the major revision and the drop-in patch nature of our updated version, we did not highlight changes inline. Instead, we have placed **red-highlighted markers** on all drop-in patches (e.g., (NEW) Example: BC-EvoCEM with Gaussians in Section 4.1) to help guide you through the revisions. --- ## Major revision on the main content: 1. Extended **Section 2 Preliminaries** with facts on the close relationship between the exponential family distribution and Bregman divergence 2. Added **Section 3 Problem Statement** to the problem explicitly and early. 3. Revised **Section 4 Method**: - summarized main results on the proposed method in EF members in the paragraph *BC-EvoCEM in Exponential Families*. Technical development is deferred to Appendix A - added a concrete, step-by-step example of the proposed BC-EvoCEM with Gaussians in the paragraph *Example: BC-EvoCEM with Gaussians* 4. Revised **Section 5 Bregman Centroid-Guided MPC for MBRL** - summarized how the proposed BC-EvoCEM fits in the MBRL with the MPC framework. A detailed and extended exposition is provided in Appendix B - revised Algorithm 2 for clarity 5. Added **Section 6: Why Bregman divergence and its centroid?** to 1) formally justify the use of Bregman divergence/centroid in the CEM setting and 2) structurally compare with other heuristic alternatives. 6. Extended *Limitations* in **Section 8 Conclusion** to clarify the scope of the proposed method and potential solution in future works. ## Revision on the supplementary material (Appendix) 1. **Appendix A**: a detailed development of the proposed BC-EvoCEM in exponential families(EFs), showing how the three major steps (centroid, score, and trust-region sampling) are tied to EF-Bregman geometry in the CEM setting. Practical considerations are discussed. 2. **Appendix B**: a *self-contained* MBRL section with background on MBRL with MPC policy and CEM as trajectory optimizer (B.1-2). B.3 provides an extended, detailed version of Section 5 in the main text with concrete diagonal Gaussian examples. 3. **Appendix C**: added proofs for statements in Section 6 4. **Appendix D**: added additional experiments with implementation and model setup details, additional results, and discussions. ### Minors 1. Revised Figure 1 caption for clarity 2. Revised Algorithm 1 and 2 for clarity (added comments) 3. Fixed typo on the *direction* of the Bregman divergence. Added clarification and remark in Appendix A.1.
Assigned Action Editor: ~Romain_Laroche1
Submission Number: 5542
Loading