Safety-Aware Reinforcement Learning via Contrastive State Representations

ICLR 2026 Conference Submission13798 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Safe RL, Contrastive Learning
TL;DR: We propose a contrastive learning framework to learn safe state representations for policies.
Abstract: In model-free Safe Reinforcement Learning (Safe RL) methods, agents are tasked with satisfying constraints in high-dimensional environments. However, they often learn from state representations that do not explicitly encode safe or unsafe information. This forces them into a prolonged trial-and-error cycle where the agent's learning process is split between constraint satisfaction and maximizing rewards. We argue that this is not fundamentally a policy learning problem, but a representation problem. To address this, we introduce a framework - Self Supervised Safe Reinforcement Learning (S3RL) - that jointly learns a control policy and safety-aware state representations. These representations are learned by maximizing the mutual information (MI) between state embeddings and their corresponding safety labels. We optimize the MI objective using a contrastive InfoNCE loss, which learns to distinguish safe states from unsafe ones. Our representation learning module is algorithm agnostic and can be integrated into various Safe RL algorithms. Integrating it into a Lagrangian-based soft actor-critic update, we prove that our joint objective guarantees stable and monotonic policy improvement. Experiments on multiple safety environment benchmarks validate that our method helps in alleviating the conflict between exploration and constraint satisfaction, leading to policies that achieve higher rewards than state-of-the-art Safe RL baselines without compromising safety.
Primary Area: reinforcement learning
Submission Number: 13798
Loading