Warm-Start Actor-Critic: From Approximation Error to Sub-optimality Gap

Published: 24 Apr 2023, Last Modified: 21 Jun 2023ICML 2023 OralPosterEveryoneRevisions
Abstract: Warm-Start reinforcement learning (RL), aided by a prior policy obtained from offline training, is emerging as a promising RL approach for practical applications. Recent empirical studies have demonstrated that the performance of Warm-Start RL can be improved *quickly* in some cases but become *stagnant* in other cases, especially when the function approximation is used. To this end, the primary objective of this work is to build a fundamental understanding on ''whether and when online learning can be significantly accelerated by a warm-start policy from offline RL?''. Specifically, we consider the widely used Actor-Critic (A-C) method with a prior policy. We first quantify the approximation errors in the Actor update and the Critic update, respectively. Next, we cast the Warm-Start A-C algorithm as Newton's method with perturbation, and study the impact of the approximation errors on the finite-time learning performance with inaccurate Actor/Critic updates. Under some general technical conditions, we derive the upper bounds, which shed light on achieving the desired finite-learning performance in the Warm-Start A-C algorithm. In particular, our findings reveal that it is essential to reduce the algorithm bias in online learning. We also obtain lower bounds on the sub-optimality gap of the Warm-Start A-C algorithm to quantify the impact of the bias and error propagation.
Submission Number: 6102