On Robust Reinforcement Learning with Lipschitz-Bounded Policy Networks

Published: 17 Jun 2024, Last Modified: 16 Jul 2024FoRLaC PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: This paper presents a study of robust policy networks in deep reinforcement learning. We investigate the benefits of policy parameterizations that naturally satisfy constraints on their Lipschitz bound, analyzing their empirical performance and robustness on two representative problems: pendulum swing-up and Atari Pong. We illustrate that policy networks with small Lipschitz bounds are significantly more robust to disturbances, random noise, and targeted adversarial attacks than unconstrained policies composed of vanilla multi-layer perceptrons or convolutional neural networks. Moreover, we find that choosing a policy parameterization with a non-conservative Lipschitz bound and an expressive, nonlinear layer architecture gives the user much finer control over the performance-robustness trade-off than existing state-of-the-art methods based on spectral normalization.
Format: Long format (up to 8 pages + refs, appendix)
Publication Status: Yes
Submission Number: 52
Loading