Fast and Robust Convergence Rate for TD(0) with Linear Function Approximation, Universal Learning Steps and I.I.D. Samples

Published: 03 Feb 2026, Last Modified: 02 May 2026AISTATS 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: Under a standard setting, we prove the first convergence result for TD(0) admitting a rate that is optimal in the number of iterations and robust to ill-conditioning.
Abstract: In this paper, we study the finite-time behavior of the TD(0) temporal-difference method with linear function approximation (LFA). We consider on-policy independent and identically distributed (i.i.d.) samples, a constant learning step, and the Polyak-Juditsky averaging method. We establish a new convergence rate, for the Mean-Square Error (MSE) on the approximated function, that is (i) *fast* in the sense that it admits an optimal dependency in the number of iterations $k$ (i.e., of order $1/k$), (ii) is *robust* to ill-conditioning: it only depends on an initial error and model-independent constants and (iii) is *sharp* up to a multiplicative constant lower than $11$. In particular, it does not depend on the smallest eigenvalue of the uncentered covariance matrix of the linear parametrization, unlike all pre-existing $O(1/k)$ rates in the TD(0) literature. We also introduce PCTD(0), a variant of TD(0), which benefits from better convergence properties under an additional assumption of strong mixing on the Markov Chain.
Code Dataset Promise: Yes
Code Dataset Url: https://github.com/ziadkobeissi/Robust_and_Fast_Convergence_TD0
Signed Copyright Form: pdf
Format Confirmation: I agree that I have read and followed the formatting instructions for the camera ready version.
Submission Number: 2399
Loading