A Finite-Time Analysis of TD Learning with Linear Function Approximation without Projections or Strong Convexity
Keywords: Temporal Difference learning, linear function approximation, finite-time convergence, Markovian noise, reinforcement learning theory
TL;DR: This paper proves that TD learning without projection converges at rate $\widetilde{\mathcal{O}}(||\theta^*||^2/\sqrt{T})$ without strong convexity under Markovian noise.
Abstract: We investigate the finite-time convergence properties of Temporal Difference (TD) learning with linear function approximation, a cornerstone algorithm in the field of reinforcement learning.
We are interested in the so-called ``robust'' setting, where the convergence guarantee does not depend on the minimal curvature of the potential function.
While prior work has established convergence guarantees in this setting, these results typically rely on the assumption that each iterate is projected onto a bounded set, a condition that is both artificial and does not match the current practice.
In this paper, we challenge the necessity of such an assumption and present a refined analysis of TD learning. For the first time, we show that the simple projection-free variant converges with a rate of $\widetilde{\mathcal{O}}(\frac{||\theta^*||^2_2}{\sqrt{T}})$, even in the presence of Markovian noise. Our analysis reveals a novel self-bounding property of the TD updates and exploits it to guarantee bounded iterates.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 12465
Loading