The Effect of Temporal Resolution in Offline Temporal Difference Estimation

ICLR 2026 Conference Submission19350 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reinforcement Learning, Temporal Difference, Temporal Discretization, Continuous Time, Value Estimation, LQR
TL;DR: This paper analyzes the impact of temporal resolution on offline temporal difference learning, showing a non-trivial trade-off in time discretization
Abstract: Temporal Difference (TD) algorithms are the most widely employed methods in Reinforcement Learning. Notably, previous theoretical analysis on these algorithms consider the sampling time as fixed a priori, while it has been shown that the temporal resolution can impact data efficiency (Burns et al., 2023). In this work, we provide an analysis of the performance of mean-path semi-gradient TD(0) for offline value estimation, emphasizing the dependence on the temporal resolution, a factor that indeed proves to be of crucial importance. In particular, by considering the continuous-time stochastic linear quadratic dynamical systems with a fixed data-budget, the behaviour of the Mean Squared Error on value estimation shows an optimal non-trivial value for the time discretization, and that the latter impacts the reliability of the algorithm. We also show that this behavior differs from that of the Monte Carlo algorithm (Zhang et al., 2023). We verify the theoretical characterization in numerical experiments in linear quadratic system instances.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 19350
Loading