Keywords: optimal control, reinforcement learning
TL;DR: Using a novel representation of symmetric linear dynamical systems with a latent state, we formulate optimal control as a convex program, giving the first polynomial-time algorithm that solves optimal control with sample complexity only polylogarithmic in the time horizon.
Abstract: We study the control of symmetric linear dynamical systems with unknown dynamics and a hidden state. Using a recent spectral filtering technique for concisely representing such systems in a linear basis, we formulate optimal control in this setting as a convex program. This approach eliminates the need to solve the non-convex problem of explicit identification of the system and its latent state, and allows for provable optimality guarantees for the control signal. We give the first efficient algorithm for finding the optimal control signal with an arbitrary time horizon T, with sample complexity (number of training rollouts) polynomial only in log(T) and other relevant parameters.