Order-Optimal Global Convergence for Actor-Critic with General Policy and Neural Critic Parametrization

Swetha Ganesh; Jiayu Chen; Washim Uddin Mondal; Vaneet Aggarwal

Order-Optimal Global Convergence for Actor-Critic with General Policy and Neural Critic Parametrization

Swetha Ganesh, Jiayu Chen, Washim Uddin Mondal, Vaneet Aggarwal

Published: 07 May 2025, Last Modified: 28 Jul 2025UAI 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Actor-Critic, Q-learning, Sample Complexity

Abstract: This paper addresses the challenge of achieving order-optimal sample complexity in reinforcement learning for discounted Markov Decision Processes (MDPs) with general policy parameterization and multi-layer neural network critics. Existing approaches either fail to achieve the optimal rate or assume a linear critic. We introduce Natural Actor-Critic with Data Drop (NAC-DD) algorithm, which integrates Natural Policy Gradient methods with a Data Drop technique to mitigate statistical dependencies inherent in Markovian sampling. NAC-DD achieves an optimal sample complexity of $\tilde{\mathcal{O}}(1/\epsilon^2)$, marking a significant improvement over the previous state-of-the-art guarantee of $\tilde{O}(1/\epsilon^3)$. The algorithm employs a multi-layer neural network critic with differentiable activation functions, aligning with real-world applications where tabular policies and linear critics are insufficient. Our work represents the first to achieve order-optimal sample complexity for actor-critic methods with neural function approximation, continuous state and action spaces, and Markovian sampling. Empirical evaluations on benchmark tasks confirm the theoretical findings, demonstrating the practical efficacy of the proposed method.

Latex Source Code: zip

Code Link: https://github.com/LucasCJYSDL/NAC-DD

Readers: auai.org/UAI/2025/Conference, auai.org/UAI/2025/Conference/Area_Chairs, auai.org/UAI/2025/Conference/Reviewers, auai.org/UAI/2025/Conference/Submission341/Authors, auai.org/UAI/2025/Conference/Submission341/Reproducibility_Reviewers

Submission Number: 341

Loading