Solving Robust MDPs through No-Regret Dynamics

18 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: Robust MDPs, Gradient Methods, No-Regret Dynamics
TL;DR: We use No-Regret Dynamics to provide simple algorithms that provably solve Robust MDPs quicky.
Abstract: Reinforcement Learning is a powerful framework for training agents to navigate different situations, but it is susceptible to changes in environmental dynamics. However, solving Markov Decision Processes that are robust to changes is difficult due to nonconvexity and complex interactions between policy and environment. While most works have analyzed this problem by taking different assumptions on the problem, a general and efficient theoretical analysis is still missing. We generate a simple, Nonconvex No-Regret framework for improving robustness by solving a minimax iterative optimization problem where a policy player and an environmental dynamics player are playing against each other. By decoupling the behavior of both players with our framework, we yield several scalable algorithms that solve Robust MDPs under different conditions on the order of $\mathcal{O}\left(\frac{1}{T^{\frac{1}{2}}}\right)$ with only a convex uncertainty set assumption.
Supplementary Material: zip
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1141
Loading