Abstract: In this article, we consider the problem of controlling an unknown linear quadratic Gaussian (LQG) system consisting of multiple subsystems connected over a network. Our goal is to minimize and quantify the regret (i.e., loss in performance) of our learning and control strategy with respect to an oracle who knows the system model. Upfront viewing the interconnected subsystems globally and directly using existing LQG learning algorithms for the global system results in a regret that increases super-linearly with the number of subsystems. Instead, we propose a new Thompson sampling-based learning algorithm which exploits the structure of the underlying network. We show that the expected regret of the proposed algorithm is bounded by <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\tilde{\mathcal {O}} (n \sqrt{T})$</tex-math></inline-formula> , where <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$n$</tex-math></inline-formula> is the number of subsystems and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$T$</tex-math></inline-formula> is the time horizon. Thus, the regret scales linearly with the number of subsystems. We present numerical experiments to illustrate the salient features of the proposed algorithm.
0 Replies
Loading