Provably Safe Representation Learning in CMDPs: A Primal-Dual Approach

Chenhao Zhou; Chao Zhang; Hanbin Zhao; Hui Qian

Provably Safe Representation Learning in CMDPs: A Primal-Dual Approach

Chenhao Zhou, Chao Zhang, Hanbin Zhao, Hui Qian

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Representation Learning; Safe Representation Learning; Constrained Markov Decision Process

Abstract: We study representation learning in low-rank Constrained Markov Decision Processes (CMDPs) with unknown dynamics, where the agent must maximize rewards under safety constraints. While representation learning has significantly advanced for unconstrained MDPs, its extension to CMDPs remains open due to the critical challenge of safe exploration under learned features, particularly concerning the management of soft constraint violation. In this work, we propose REP-PD, the first algorithm that provably integrates representation learning with policy optimization in low-rank CMDPs. By iteratively learning a low-rank transition representation via MLE and utilizing a composite Q-function tied to the unconstrained Lagrangian, REP-PD guides policy updates to balance reward maximization, exploration, and robust constraint adherence. Through this approach, REP-PD achieves a near-optimal policy with a sampling complexity bound independent of the state space dimension without prior feature knowledge. Notably, REP-PD's regret matches the lower bounds for unconstrained low-rank MDPs, achieving strong performance concerning soft constraint violation. We then consider a stronger hard constraint violation metric, where the agent must strictly satisfy constraints at all times, and propose REP-PD-hard by designing a novel policy optimization module. Our work thus provides a robust and theoretically grounded approach to representation learning in constrained reinforcement learning, with guarantees on bounded soft and hard constraint violation.

Primary Area: reinforcement learning

Submission Number: 7104

Loading