Stability and Generalization for Bellman Residuals

14 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: statistical learning theory, algorithmic stability, generalization analysis, offline reinforcement learning, inverse reinforcement learning
TL;DR: Our analysis yields an $\mathcal{O}(1/n)$ on-average argument-stability bound for Bellman residual minimization—doubling the best known sample-complexity exponent for convex–concave saddle problems.
Abstract: Offline reinforcement learning and offline inverse reinforcement learning aim to recover near–optimal value functions or reward models from a fixed batch of logged trajectories, yet current practice still struggles to enforce Bellman consistency. Bellman residual minimization (BRM) has emerged as an attractive remedy, as a globally convergent stochastic gradient descent–ascent based method for BRM has been recently discovered. However, its statistical behavior in the offline setting remains largely unexplored. In this paper, we close this statistical gap. Our analysis introduces a single Lyapunov potential that couples SGDA runs on neighbouring datasets and yields an $\mathcal{O}(1/n)$ on-average argument-stability bound—doubling the best known sample-complexity exponent for convex–concave saddle problems. The same stability constant translates into the $\mathcal{O}(1/n)$ excess risk bound for BRM, without variance reduction, extra regularization, or restrictive independence assumptions on minibatch sampling. The results hold for standard neural-network parameterizations and minibatch SGD.
Supplementary Material: pdf
Primary Area: learning theory
Submission Number: 5065
Loading