Towards Parameter-Free Temporal Difference Learning

Yunxiang LI; Mark Schmidt; Reza Babanezhad Harikandeh; Sharan Vaswani

Towards Parameter-Free Temporal Difference Learning

Yunxiang LI, Mark Schmidt, Reza Babanezhad Harikandeh, Sharan Vaswani

Published: 23 Sept 2025, Last Modified: 01 Dec 2025ARLETEveryoneRevisionsBibTeXCC BY 4.0

Track: Research Track

Keywords: temporal difference, finite time analysis, exponential step-size

Abstract: Temporal difference (TD) learning is a fundamental algorithm for estimating value functions in reinforcement learning. However, its empirical performance is sensitive to the choice of the step-size parameter. While recent finite-time analyses of TD with linear function approximation have quantified its theoretical convergence rate, these analyses do not provide a practical scheme to set the step-size. In fact, these analyses require setting algorithm parameters to depend on typically unknown problem-dependent quantities, such as the minimum eigenvalue ($\omega$) of the feature covariance, or the mixing time ($\tau_{\text{mix}}$) of the underlying Markov chain. These requirements limit the practical use of the resulting algorithms. Inspired by the optimization literature, we address these limitations by using an exponential step-size schedule in two standard sampling regimes: i.i.d. sampling from the stationary distribution, and the more practical Markovian sampling from a single trajectory. Unlike previous works in the i.i.d. setting, the proposed algorithm does not require the knowledge of $\omega$, is adaptive to the noise in the TD updates, and effectively trades off bias and variance. In the Markovian setting, TD with exponential step-sizes achieves a similar convergence rate as previous works, but does not require any projections, averaging, or the knowledge of $\tau_{\text{mix}}$. Finally, we experimentally demonstrate that our algorithm is competitive with existing methods and robust across environments and sampling protocols.

Submission Number: 110

Loading