- Abstract: We present the ﬁrst framework of Certifying Robust Policies for reinforcement learning (CROP) against adversarial state perturbations. We propose two particular types of robustness certiﬁcation criteria: robustness of per-state actions and lower bound of cumulative rewards. Speciﬁcally, we develop a local smoothing algorithm which uses a policy derived from Q-functions smoothed with Gaussian noise over each encountered state to guarantee the robustness of actions taken along this trajectory. Next, we develop a global smoothing algorithm for certifying the robustness of a ﬁnite-horizon cumulative reward under adversarial state perturbations. Finally, we propose a local smoothing approach which makes use of adaptive search in order to obtain tight certiﬁcation bounds for reward. We use the proposed RL robustness certiﬁcation framework to evaluate six methods that have previously been shown to yield empirically robust RL, including adversarial training and several forms of regularization, on two representative Atari games. We show that RegPGD, RegCVX, and RadialRL achieve high certiﬁed robustness among these. Furthermore, we demonstrate that our certiﬁcations are often tight by evaluating these algorithms against adversarial attacks.
- Supplementary Material: zip