Abstract: We explore reinforcement learning methods for finding the optimal policy in the nearly linear-quadratic control systems. In particular, we consider a dynamic system composed of the summation of a linear and a nonlinear components, which is governed by a policy with the same structure. Assuming that the nonlinear part consists of kernels with small Lipschitz coefficients, we characterize the optimization landscape of the cost function. While the resulting landscape is generally nonconvex, we show local strong convexity and smoothness of the cost function around the global optimizer. In addition, we design a policy gradient algorithm with a carefully chosen initialization and prove that the algorithm is guaranteed to converge to the globally optimal policy with a linear rate.