Towards Minimax Optimality of Model-based Robust Reinforcement Learning

Published: 26 Apr 2024, Last Modified: 15 Jul 2024UAI 2024 oralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Robust MDPs, Robust Reinforcement Learning, Sample complexity
TL;DR: We study the sample complexity of obtaining an $\epsilon$-optimal policy in Robust discounted Markov Decision Processes (RMDPs), given only access to a generative model of the nominal kernel.
Abstract: We study the sample complexity of obtaining an $\epsilon$-optimal policy in \emph{Robust} discounted Markov Decision Processes (RMDPs), given only access to a generative model of the nominal kernel. This problem is widely studied in the non-robust case, and it is known that any planning approach applied to an empirical MDP estimated with $\tilde{\mathcal{O}}(\frac{H^3 |S||A|}{\epsilon^2})$ samples provides an $\epsilon$-optimal policy, which is minimax optimal. Results in the robust case are much more scarce. For $sa$- (resp $s$-) rectangular uncertainty sets, until recently the best-known sample complexity was $\tilde{\mathcal{O}}(\frac{H^4 |S|^2|A|}{\epsilon^2})$ (resp. $\tilde{\mathcal{O}}(\frac{H^4 | S |^2| A |^2}{\epsilon^2})$), for specific algorithms and when the uncertainty set is based on the total variation (TV), the KL or the Chi-square divergences. In this paper, we consider uncertainty sets defined with an $L_p$-ball (recovering the TV case), and study the sample complexity of \emph{any} planning algorithm (with high accuracy guarantee on the solution) applied to an empirical RMDP estimated using the generative model. In the general case, we prove a sample complexity of $\tilde{\mathcal{O}}(\frac{H^4 | S || A |}{\epsilon^2})$ for both the $sa$- and $s$-rectangular cases (improvements of $| S |$ and $| S || A |$ respectively). When the size of the uncertainty is small enough, we improve the sample complexity to $\tilde{\mathcal{O}}(\frac{H^3 | S || A | }{\epsilon^2})$, recovering the lower-bound for the non-robust case for the first time and a robust lower-bound. Finally, we also introduce simple and efficient algorithms for solving the studied $L_p$ robust MDPs.
List Of Authors: Clavier, Pierre and Erwan, Le Pennec and Matthieu, Geist
Latex Source Code: zip
Signed License Agreement: pdf
Submission Number: 76
Loading