Finite-Time Analysis of Actor-Critic Methods with Deep Neural Network Approximation

ICLR 2026 Conference Submission11001 Authors

Published: 26 Jan 2026, Last Modified: 26 Jan 2026ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: finite-time analysis, actor-critic, deep neural network
TL;DR: This paper provides the first finite-time analysis of single-timescale AC with deep neural network approximation in continuous state–action spaces -- a setting ubiquitous in modern RL.
Abstract: Actor–critic (AC) algorithms underpin many of today’s most successful reinforcement learning (RL) applications, yet their finite-time convergence in realistic settings remains largely underexplored. Existing analyses often rely on oversimplified formulations and are largely confined to linear function approximation. In practice, however, nonlinear approximations with deep neural networks dominate AC implementations, leaving a substantial gap between theory and practice. In this work, we provide the first finite-time analysis of single-timescale AC with deep neural network approximation in continuous state-action spaces. In particular, we consider the challenging time-average reward setting, where one needs to simultaneously control three highly-coupled error terms including the reward error, the critic error, and the actor error. Our novel analysis is able to establish convergence to a stationary point at a rate $\widetilde{\mathcal{O}}(T^{-1/2})$, where $T$ denotes the total number of iterations, thereby providing theoretical grounding for widely used deep AC methods. We substantiate these theoretical guarantees with experiments that confirm the proven convergence rate and further demonstrate strong performance on MuJoCo benchmarks.
Supplementary Material: zip
Primary Area: learning theory
Submission Number: 11001
Loading