Model-independent O(1/k)-convergence rate for TD(0) with linear function approximation, universal learning steps and i.i.d. samples
TL;DR: Under a standard setting, we prove the first convergence result for TD(0) admitting a rate that is optimal in the number of iterations and robust to ill-conditioning.
Abstract: In this paper, we study the finite-time behaviour of the TD(0) temporal-difference method with linear function approximation (LFA). We consider on-policy i.i.d. samples, a constant learning step, and the Polyak-Juditsky averaging method. We establish a new convergence rate that is (i) optimal in the number of iterations $k$ (i.e., of order $1/k$) and (ii) is model-independent: it does not depend on the choice of the linear parametrisation and is robust to ill-conditioning. This resolves a question posed by Lakshminarayanan and Szepesvari (2018) about the attainability of such a rate, open for more than seven years. Our analysis extends to TD(0) the results by Bach and Moulines (2013), who obtained a similar rate for Stochastic Gradient Descent (SGD) on least-square regression problems.
Submission Number: 2399
Loading