Training Verifiably Robust Agents Using Set-Based Reinforcement Learning

Manuel Wendl; Lukas Koller; Tobias Ladner; Matthias Althoff

Training Verifiably Robust Agents Using Set-Based Reinforcement Learning

Manuel Wendl, Lukas Koller, Tobias Ladner, Matthias Althoff

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning, Robustness, Formal Methods, Zonotope, Set-Based Computing, Robust Reinforcement Learning, Adversarial Training

TL;DR: We contribute an observation-robust reinforcement learning algorithm that leverages set-based neural network training to obtain agents suitable for formal verification.

Abstract: Deep reinforcement learning uses neural networks to solve complex control tasks. However, neural networks are sensitive to input perturbations, which makes their deployment in safety-critical environments challenging and thus their formal verification necessary. This work lifts recent results from formal verification of neural networks to reinforcement learning in continuous state and action spaces. While previous work mainly focuses on adversarial attacks for robust reinforcement learning, we augment reinforcement learning with set-based computing: We enclose all possible outputs for a set of perturbed inputs and compute a gradient set for training, i.e., each possible output has a different gradient. Thereby, we control the size of the propagated sets, yielding favorable worst-case bounds for actions and value functions that enable formal verification across different verification frameworks for up to 9 times larger input perturbations. Our work addresses the gap between state-of-the-art adversarial training methods and formal verification to train verifiably robust agents, making them applicable in safety-critical environments.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 7890

Loading