Solving Robust MDPs through No-Regret Dynamics

TMLR Paper2364 Authors

10 Mar 2024 (modified: 21 Mar 2024)Under review for TMLREveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Reinforcement learning is a powerful framework for training agents to navigate different situations, but it is susceptible to changes in environmental dynamics. Generating an algorithm that can find environmentally robust policies efficiently and handle different model parameterizations without imposing stringent assumptions on the uncertainty set of transitions is difficult due to the intricate interactions between policy and environment. In this paper, we address both of these issues with a No-Regret Dynamics framework that utilizes policy gradient methods and iteratively approximates the worst case environment during training, avoiding assumptions on the uncertainty set. Alongside a toolbox of nonconvex online learning algorithms, we demonstrate that our framework can achieve fast convergence rates for many different problem settings and relax assumptions on the uncertainty set of transitions.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Erin_J_Talvitie1
Submission Number: 2364
Loading