Track: Research Track
Keywords: temporal difference, finite time analysis, exponential step-size
Abstract: Temporal difference (TD) learning is a fundamental algorithm for estimating value functions in reinforcement learning. However, its empirical performance is sensitive to the choice of the step-size parameter. While recent finite-time analyses of TD with linear function approximation have quantified its theoretical convergence rate, these analyses do not provide a practical scheme to set the step-size. In fact, these analyses require setting algorithm parameters to depend on typically unknown problem-dependent quantities, such as the minimum eigenvalue ($\omega$) of the feature covariance, or the mixing time ($\tau_{\text{mix}}$) of the underlying Markov chain. These requirements limit the practical use of the resulting algorithms. Inspired by the optimization literature, we address these limitations by using an exponential step-size schedule in two standard sampling regimes: i.i.d. sampling from the stationary distribution, and the more practical Markovian sampling from a single trajectory. Unlike previous works in the i.i.d. setting, the proposed algorithm does not require the knowledge of $\omega$, is adaptive to the noise in the TD updates, and effectively trades off bias and variance. In the Markovian setting, TD with exponential step-sizes achieves a similar convergence rate as previous works, but does not require any projections, averaging, or the knowledge of $\tau_{\text{mix}}$. Finally, we experimentally demonstrate that our algorithm is competitive with existing methods and robust across environments and sampling protocols.
Submission Number: 110
Loading