Policy gradient finds global optimum of nearly linear-quadratic control systemsDownload PDF

Published: 23 Nov 2022, Last Modified: 05 May 2023OPT 2022 PosterReaders: Everyone
Abstract: We explore reinforcement learning methods for finding the optimal policy in the nearly linear-quadratic control systems. In particular, we consider a dynamic system composed of the summation of a linear and a nonlinear components, which is governed by a policy with the same structure. Assuming that the nonlinear part consists of kernels with small Lipschitz coefficients, we characterize the optimization landscape of the cost function. While the resulting landscape is generally nonconvex, we show local strong convexity and smoothness of the cost function around the global optimizer. In addition, we design a policy gradient algorithm with a carefully chosen initialization and prove that the algorithm is guaranteed to converge to the globally optimal policy with a linear rate.
0 Replies