Explicit Pareto Front Optimization for Constrained Reinforcement Learning

Sandy Huang; Abbas Abdolmaleki; Philemon Brakel; Steven Bohez; Nicolas Heess; Martin Riedmiller; raia hadsell

Explicit Pareto Front Optimization for Constrained Reinforcement Learning

Sandy Huang, Abbas Abdolmaleki, Philemon Brakel, Steven Bohez, Nicolas Heess, Martin Riedmiller, raia hadsell

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: constrained reinforcement learning, multi-objective reinforcement learning, continuous control, deep reinforcement learning

Abstract: Many real-world problems require that reinforcement learning (RL) agents learn policies that not only maximize a scalar reward, but do so while meeting constraints, such as remaining below an energy consumption threshold. Typical approaches for solving constrained RL problems rely on Lagrangian relaxation, but these suffer from several limitations. We draw a connection between multi-objective RL and constrained RL, based on the key insight that the constraint-satisfying optimal policy must be Pareto optimal. This leads to a novel, multi-objective perspective for constrained RL. We propose a framework that uses a multi-objective RL algorithm to find a Pareto front of policies that trades off between the reward and constraint(s), and simultaneously searches along this front for constraint-satisfying policies. We show that in practice, an instantiation of our framework outperforms existing approaches on several challenging continuous control domains, both in terms of solution quality and sample efficiency, and enables flexibility in recovering a portion of the Pareto front rather than a single constraint-satisfying policy.

One-sentence Summary: We introduce a novel framework for constrained RL, by leveraging the ability of multi-objective RL algorithms to find Pareto-optimal solutions.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Reviewed Version (pdf): /references/pdf?id=f2S5O90CWG

11 Replies

Loading