Improved Sample Complexity for Global Convergence of Actor-Critic Algorithms

24 Sept 2024 (modified: 19 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Policy Gradient, Actor-Critic Algorithm, Global Convergence, Sample Complexity
TL;DR: Global convergence of actor-critic method with sample complexity of $O(\epsilon^{-3})$ compared to existing rates of $O(\epsilon^{-4})$.
Abstract: In this paper, we establish the global convergence of the actor-critic algorithm with a significantly improved sample complexity of \( O(\epsilon^{-3}) \), advancing beyond the existing local convergence results. Previous works provide local convergence guarantees with a sample complexity of \( O(\epsilon^{-2}) \) for bounding the squared gradient of the return, which translates to a global sample complexity of \( O(\epsilon^{-4}) \) using the gradient domination lemma. In contrast to traditional methods that employ decreasing step sizes for both the actor and critic, we demonstrate that a constant step size for the critic is sufficient to ensure convergence. This key insight reveals that using a decreasing step size for the actor alone is sufficient to handle the noise for both the actor and critic. Our findings provide theoretical support for the practical success of many algorithms that rely on constant step sizes.
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3716
Loading