On the Convergence of Continuous Single-timescale Actor-critic

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: This paper establishes convergence guarantees for single-timescale actor-critic methods in continuous state-action spaces with Markovian sampling.
Abstract: Actor-critic algorithms have been instrumental in boosting the performance of numerous challenging applications involving continuous control, such as highly robust and agile robot motion control. However, their theoretical understanding remains largely underdeveloped. Existing analyses mostly focus on finite state-action spaces and on simplified variants of actor-critic, such as double-loop updates with i.i.d. sampling, which are often impractical for real-world applications. We consider the canonical and widely adopted single-timescale updates with Markovian sampling in continuous state-action space. Specifically, we establish finite-time convergence by introducing a novel Lyapunov analysis framework, which provides a unified convergence characterization of both the actor and the critic. Our approach is less conservative than previous methods and offers new insights into the coupled dynamics of actor-critic updates.
Lay Summary: Actor-critic algorithms have played a pivotal role in advancing performance across a range of challenging continuous control tasks, including robust and agile robotic motion. In this work, we establish the finite-time convergence of the widely used single-timescale actor-critic algorithm with Markovian sampling under continuous state-action spaces. This result bridges the gap between practical implementations and theoretical guarantees, and offers a promising method for analyzing other single-timescale reinforcement learning algorithms.
Primary Area: Theory->Learning Theory
Keywords: single-timescale actor-critic, continuous state-action space, Markovian sampling
Submission Number: 4747
Loading