Spectral Perturbation Bounds for Experience Replay: A Bias–Variance Decomposition for Offline Decision-Making

Saket Atreya

Spectral Perturbation Bounds for Experience Replay: A Bias–Variance Decomposition for Offline Decision-Making

Saket Atreya

Published: 25 May 2026, Last Modified: 27 May 2026DEMO 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Offline Reinforcement Learning, Experience Replay, Spectral Theory, Markov Chains, Mixing Times, Bias-Variance Tradeoff, Conservative RL, Distribution Shift, Policy Evaluation, Spectral Perturbation

TL;DR: We formalize the offline RL bias-variance tradeoff using the spectral theory of Markov chains, proving that dataset diversity accelerates mixing at the cost of distribution-shift bias governed by the environment's mixing time.

Abstract: Offline decision-making relies on datasets collected from heterogeneous and often suboptimal policies, leading to a fundamental trade-off between statistical efficiency and distribution shift. We study this trade-off by modeling an offline dataset as inducing a mixture Markov kernel over state transitions. We show that the diversity of the dataset improves statistical efficiency by accelerating mixing, while deviations between behavior policies introduce bias that scales with policy distance. Our analysis yields a bias-variance decomposition in which both terms are governed by the environment's mixing time. This provides a principled explanation for conservative methods in offline RL and predicts that the value of diverse datasets is strongly environment-dependent. We theoretically formalize this spectral framework and provide empirical validations using synthetic MDPs to demonstrate the tight coupling between spectral perturbations and the efficacy of offline dataset utilization.

Submission Number: 46

Loading