Abstract: We present REVEL, a partially neural reinforcement learning (RL) framework for
provably safe exploration in continuous state and action spaces. A key challenge
for provably safe deep RL is that repeatedly verifying neural networks within a
learning loop is computationally infeasible. We address this challenge using two
policy classes: a general, neurosymbolic class with approximate gradients and a
more restricted class of symbolic policies that allows efficient verification. Our
learning algorithm is a mirror descent over policies: in each iteration, it safely lifts
a symbolic policy into the neurosymbolic space, performs safe gradient updates to
the resulting policy, and projects the updated policy into the safe symbolic subset,
all without requiring explicit verification of neural networks. Our empirical results
show that REVEL enforces safe exploration in many scenarios in which Constrained
Policy Optimization does not, and that it can discover policies that outperform
those learned through prior approaches to verified exploration.
0 Replies
Loading