Keywords: Double Q-learning, Finite-time analysis, Convergence rate, Stochastic approximation
Abstract: Double Q-learning \citep{hasselt2010double} has gained significant success in practice due to its effectiveness in overcoming the overestimation issue of Q-learning. However, theoretical understanding of double Q-learning is rather limited and the only existing finite-time analysis was recently established in \citet{xiong2020double} under a polynomial learning rate. This paper analyzes the more challenging case with a rescaled linear/constant learning rate for which the previous method does not appear to be applicable. We develop new analytical tools that achieve an order-level better finite-time convergence rate than the previously established result. Specifically, we show that synchronous double Q-learning attains an $\epsilon$-accurate global optimum with a time complexity of $\Omega\left(\frac{\ln D}{(1-\gamma)^7\epsilon^2} \right)$, and the asynchronous algorithm attains a time complexity of $\tilde{\Omega}\left(\frac{L}{(1-\gamma)^7\epsilon^2} \right)$, where $D$ is the cardinality of the state-action space, $\gamma$ is the discount factor, and $L$ is a parameter related to the sampling strategy for asynchronous double Q-learning. These results improve the order-level dependence of the convergence rate on all major parameters $(\epsilon,1-\gamma, D, L)$ provided in \citet{xiong2020double}. The new analysis in this paper presents a more direct and succinct approach for characterizing the finite-time convergence rate of double Q-learning.
One-sentence Summary: This paper provides an order-level better finite-time convergence rate for double Q-learning by developing a new analysis approach.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Supplementary Material: zip
Reviewed Version (pdf): https://openreview.net/references/pdf?id=TgaqUNN1UY
9 Replies
Loading