KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal

Tadashi Kozuno; Wenhao Yang; Nino Vieillard; Toshinori Kitamura; Yunhao Tang; Jincheng Mei; Pierre MENARD; Mohammad Gheshlaghi Azar; Michal Valko; Remi Munos; Olivier Pietquin; Matthieu Geist; Csaba Szepesvari

KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal

Tadashi Kozuno, Wenhao Yang, Nino Vieillard, Toshinori Kitamura, Yunhao Tang, Jincheng Mei, Pierre MENARD, Mohammad Gheshlaghi Azar, Michal Valko, Remi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvari

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone

Keywords: Reinforcement Learning, Minimax-Optimality, Generative Model, KL Regularization, Entropy Regularization

TL;DR: We show that KL-entropy-regularized value iteration is minimax-optimal under the generative model setting.

Abstract: In this work, we consider and analyze the sample complexity of model-free reinforcement learning with a generative model. Particularly, we analyze mirror descent value iteration (MDVI) by Geist et al. (2019) and Vieillard et al. (2020a), which uses the Kullback-Leibler divergence and entropy regularization in its value and policy updates. Our analysis shows that it is nearly minimax-optimal for finding an ε-optimal policy when ε is sufficiently small. This is the first theoretical result that demonstrates that a simple model-free algorithm without variance-reduction can be nearly minimax-optimal under the considered setting.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Theory (eg, control theory, learning theory, algorithmic game theory)

9 Replies

Loading