Finite-Time Analysis of Entropy-Regularized Neural Natural Actor-Critic Algorithm

Published: 06 Apr 2024, Last Modified: 06 Apr 2024Accepted by TMLREveryoneRevisionsBibTeX
Abstract: Natural actor-critic (NAC) and its variants, equipped with the representation power of neural networks, have demonstrated impressive empirical success in solving Markov decision problems with large (potentially infinite) state spaces. In this paper, we present a finite-time analysis of NAC with neural network approximation, and identify the roles of neural networks, regularization and optimization techniques (e.g., gradient clipping and weight decay) to achieve provably good performance in terms of sample complexity, iteration complexity and overparametrization bounds for the actor and the critic. In particular, we prove that (i) entropy regularization and weight decay ensure stability by providing sufficient exploration to avoid near-deterministic and strictly suboptimal policies and (ii) regularization leads to sharp sample complexity and network width bounds in the regularized MDPs, yielding a favorable bias-variance tradeoff in policy optimization. In the process, we identify the importance of uniform approximation power of the actor neural network to achieve global optimality in policy optimization due to distributional shift.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: We thank all the reviewers and the action editor for the very valuable and constructive feedback. To address the comments and suggestions by the reviewers, we have made the following changes in this revision: * We have now included additional discussions and remarks on the assumptions and our contributions. Particularly, we have (i) expanded the discussions in Sections 4.2 and 4.4 on the realizability assumptions, (ii) provided a discussion in Section B on the sampling oracle assumption, and (iii) the impact of training the output layer in Section 3.1. * We have included a new result, Corollary 2, that presents a finite-time error bound *without* the realizability assumptions (Assumptions 2-3), which explicitly shows the approximation errors for both the actor and the critic. These approximation error terms were characterized in Section A.4. * We have revised the notation and the citation styles as suggested by the reviewers, and corrected typos. We have also provided pointers in the paper to aid the reader in navigating the content.
Assigned Action Editor: ~Adam_M_White1
Submission Number: 1864