Safe Reinforcement Learning with Contrastive Risk Prediction

Hanping Zhang; Yuhong Guo

Safe Reinforcement Learning with Contrastive Risk Prediction

Hanping Zhang, Yuhong Guo

Published: 17 Jun 2024, Last Modified: 28 Jun 2024FoRLaC PosterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: As safety violations can lead to severe consequences in real-world applications, the increasing deployment of Reinforcement Learning (RL) in safety-critical domains such as robotics has propelled the study of safe exploration for reinforcement learning (safe RL). In this work, we propose a risk preventive training method for safe RL, which learns a binary classifier based on contrastive sampling to predict the probability of a state-action pair leading to unsafe states. Based on the predicted risk probabilities, risk preventive trajectory exploration and optimality criterion modification can be simultaneously conducted to induce safe RL policies. We conduct experiments in robotic simulation environments. The results show the proposed approach outperforms existing model-free safe RL approaches, and yields comparable performance with the state-of-the-art model-based method.

Format: Long format (up to 8 pages + refs, appendix)

Publication Status: No

Submission Number: 63

Loading