Viability of Future Actions: Robust Reinforcement Learning via Entropy Regularization

Published: 01 Aug 2024, Last Modified: 09 Oct 2024EWRL17EveryoneRevisionsBibTeXCC BY 4.0
Keywords: reinforcement learning, viability, safe reinforcement learning, robust learning
TL;DR: Entropy-regularized RL with constraint penalties learns robust policies by preserving the amount of viable future actions.
Abstract: Despite the many recent advances in reinforcement learning (RL), the question of learning policies that robustly satisfy state constraints under disturbances remains open. This paper reveals how robustness arises naturally by combining two common practices in unconstrained RL: entropy regularization and constraints penalization. Our results provide a method to learn robust policies, model-free and with standard popular algorithms. We begin by showing how entropy regularization biases the constrained RL problem towards maximizing the number of future viable actions, which is a form of robustness. Then, we relax the safety constraints via penalties to obtain an unconstrained RL problem, which we show approximates its constrained counterpart arbitrarily closely. We support our findings with illustrative examples and on popular RL benchmarks.
Submission Number: 134
Loading