Data Efficient Safe Reinforcement Learning

Sindhu Padakandla, Prabuchandran K. J., Sourav Ganguly, Shalabh Bhatnagar

Published: 2022, Last Modified: 06 Jan 2026SMC 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Applying reinforcement learning (RL) methods for real world applications pose multiple challenges - the foremost being safety of the system controlled by the learning agent and the learning efficiency. An RL agent learns to control a system by exploring the available actions in various operating states. In some states, when the RL agent exercises an exploratory action, the system may enter unsafe operation, which can lead to safety hazards both for the system as well as for humans supervising the system. RL algorithms thus must learn to control the system respecting safety. In this work, we formulate the safe RL problem in the constrained off-policy setting that facilitates safe exploration by the RL agent. We then develop a sample efficient algorithm utilizing the cross-entropy method. The proposed algorithm’s safety performance is evaluated numerically on benchmark RL problems.

External IDs:dblp:conf/smc/PadakandlaJGB22