Zero-Shot Constraint Satisfaction with Forward- Backward Representations

Published: 01 Jul 2025, Last Modified: 01 Jul 2025RLBrew: Ingredients for Developing Generalist Agents workshop (RLC 2025)EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reinforcement learning, constrained reinforcement learning, successor measure
TL;DR: A principled algorithm for performing zero-shot constrained policy optimization using forward-backward representations
Abstract: Traditionally, constrained policy optimization with Reinforcement Learning (RL) requires learning a new policy from scratch for any new environment, goal or cost function, with limited generalization to new tasks and constraints. Given the sample inefficiency of many common deep RL methods, this procedure can be impractical for many real-world scenarios, particularly when constraints or tasks are changing. As an alternative, in the unconstrained setting, various works have sought to pre-train representations from offline datasets to accelerate policy optimization upon specification of a reward. Such methods can permit faster adaptation to new tasks in a given environment, dramatically improving sample efficiency. Recently, zero-shot policy optimization has been explored by leveraging a particular $\textit{forward-backward}$ decomposition of the successor measure to learn compact, task-agnostic representations of the environment dynamics. However, these methods have been primarily studied in the unconstrained setting. In this work, we introduce a method for performing zero-shot $\textit{constrained}$ policy optimization from forward-backward representations. We introduce a principled inference-time procedure for zero-shot constrained policy optimization and demonstrate its empirical performance on illustrative environments. Finally, we show that even in simple environments, there remains an optimality gap in zero-shot constrained policy optimization, inviting future developments in this area.
Submission Number: 12
Loading