On the Tension Between Optimality and Adversarial Robustness in Policy Optimization

Published: 26 Jan 2026, Last Modified: 01 Mar 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: reinforcement learning, adversarial robustness, policy optimization, theory-practice gap, bilevel optimization
TL;DR: This paper demystifies the gap between theory and practice in adversarially robust reinforcement learning and address this by proposing BARPO, a novel framework that bridges adversarial robustness and optimality.
Abstract: Achieving optimality and adversarial robustness in deep reinforcement learning has long been regarded as conflicting goals. Nonetheless, recent theoretical insights presented in CAR suggest a potential alignment, raising the important question of how to realize this in practice. This paper first identifies a key gap between theory and practice by comparing standard policy optimization (SPO) and adversarially robust policy optimization (ARPO). Although they share theoretical consistency, *a fundamental tension between robustness and optimality arises in practical policy gradient methods*. SPO tends toward convergence to vulnerable first-order stationary policies (FOSPs) with strong natural performance, whereas ARPO typically favors more robust FOSPs at the expense of reduced returns. Furthermore, we attribute this tradeoff to the *reshaping effect of the strongest adversaries* in ARPO, which significantly complicates the global landscape by inducing *deceptive sticky FOSPs*. This improves robustness but makes navigation more challenging. To alleviate this, we develop the *BARPO*, a bilevel framework unifying SPO and ARPO by modulating adversary strength, thereby facilitating navigability while preserving global optima. Extensive empirical results demonstrate that BARPO consistently outperforms vanilla ARPO, providing a practical approach to reconcile theoretical and empirical performance.
Primary Area: reinforcement learning
Submission Number: 11884
Loading