A modified Thompson sampling-based learning algorithm for unknown linear systems

Mukul Gagrani, Sagar Sudhakara, Aditya Mahajan, Ashutosh Nayyar, Yi Ouyang

Published: 2022, Last Modified: 15 May 2023CDC 2022Readers: Everyone

Abstract: We revisit the Thompson sampling-based learning algorithm for controlling an unknown linear system with quadratic cost proposed in [1]. This algorithm operates in episodes of dynamic length and it is shown to have a regret bound of $\tilde {\mathcal{O}}\left( {\sqrt T } \right)$, where T is the time-horizon. The regret bound of this algorithm is obtained under a technical assumption on the induced norm of the closed loop system. We propose a variation of this algorithm that enforces a lower bound T <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">min</inf> on the episode length. We show that a careful choice of T <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">min</inf> (that depends on the uncertainty about the system model) allows us to recover the $\tilde {\mathcal{O}}\left( {\sqrt T } \right)$ regret bound under a milder technical condition about the closed loop system.

0 Replies