A Diffusion Approximation for Temporal-Difference Learning with Linear Features under Markovian Noise

Published: 25 May 2026, Last Modified: 27 May 2026DEMO 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: temporal difference learning, linear TD, stochastic approximation, diffusion approximation, Markov noise, stochastic differential equations, Poisson equation, long-run covariance, reinforcement learning theory
TL;DR: We introduce an SDE approximation for linear TD(0) under Markovian noise that captures both mean convergence and finite-time stochastic fluctuations, yielding interpretable covariance, stability, and stepsize insights beyond the classical ODE view.
Abstract: Temporal difference (TD) learning with linear function approximation is a core method for policy evaluation. Its classical continuous-time description is an ordinary differential equation (ODE), which captures the asymptotic mean dynamics but neglects stochastic fluctuations determining the error floor. We introduce a stochastic differential equation (SDE) approximation for linear TD(0) under Markovian noise. The resulting model distinguishes the contraction dynamics governed by the projected Bellman operator from the influence of Markovian sampling. As consequences, we complement classical results with a covariance dynamics, a local Ornstein-Uhlenbeck description, an explicit estimate on the mixing time influence on convergence, and a new range of admissible stepsizes.
Submission Number: 93
Loading