Abstract: We address the problem of online learning in predictive control of an unknown linear dynamical system with time varying cost functions. We consider the setting where the control algorithm does not know the true system model and has only access to a fixed-length (that does not grow with the control horizon) preview of the future cost functions. We characterize the performance of the algorithm using the metric of dynamic regret, which is defined as the difference between the cumulative cost incurred by the algorithm and that of the best sequence of actions in hindsight. We propose a novel online learning predictive control algorithm called Optimistic MPC (O-MPC) algorithm. We show that under the standard stability assumption for the true underlying system, the O-MPC algorithm achieves $\mathcal{O}\left( {{T^{2/3}}} \right)$ dynamic regret.
0 Replies
Loading