Focusing on the Riskiest: Gaussian Mixture Models for Safe Reinforcement Learning

Focusing on the Riskiest: Gaussian Mixture Models for Safe Reinforcement Learning

ICLR 2026 Conference Submission16745 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Safe Reinforcement Learning, Gaussian Mixture Models, Conditional Value-at-Risk

Abstract: Reinforcement learning under safety constraints remains a fundamental challenge. While primal–dual formulations provide a principled framework for enforcing such constraints, their effectiveness depends critically on accurate modeling of cost distributions. Existing approaches often impose Gaussian assumptions and approximate risk either by the mean or by CVaR, yet these formulations inherently fail to capture complex, multimodal, or heavy-tailed risks. To overcome these limitations, we propose GMM‑SSAC(Gaussian Mixture Model‑Based Supremum CVaR‑Guided Safe Soft Actor‑Critic), whose core is the Supremum Conditional Value‑at‑Risk (SCVaR) criterion: a coherent and robust safety measure that explicitly targets the worst‑case tail across all components of a Gaussian mixture. To support accurate SCVaR estimation online, we introduce an incremental EM‑based update that refines the GMM parameters by blending instantaneous safety samples with Bellman‑transformed estimates—ensuring unbiased, convergent parameter estimates for reliable SCVaR computation. Empirical evaluations on standard safety benchmarks demonstrate that GMM‑SSAC substantially improves risk sensitivity and safety while maintaining competitive task performance, validating SCVaR as a principled and effective cost estimator for safe reinforcement learning.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 16745

Loading