Keywords: Model-based RL, Safe RL, Safety Filter, Exploration
TL;DR: We combine constrained model-based policy optimization with planning-based safety filters as a backup policy to reduce constraint violation rates during exploration.
Abstract: Applying reinforcement learning (RL) to learn effective policies on physical robots without supervision remains challenging when it comes to tasks where safe exploration is critical. Constrained model-based RL (CMBRL) presents a promising approach to this problem. These methods are designed to learn constraint-adhering policies through constrained optimization approaches. Yet, such policies often fail to meet stringent safety requirements during learning and exploration. Our solution ``CASE'' aims to reduce the instances where constraints are breached during the learning phase. Specifically, CASE integrates techniques for optimizing constrained policies and employs planning-based safety filters as backup policies, effectively lowering constraint violations during learning and making it a more reliable option than other recent constrained model-based policy optimization methods.
Supplementary Material: zip
Spotlight Video: mp4
Publication Agreement: pdf
Student Paper: no
Submission Number: 631
Loading