The Confusing Instance Principle for Online Linear Quadratic Control

Waris Radji; Odalric-Ambrym Maillard

The Confusing Instance Principle for Online Linear Quadratic Control

Waris Radji, Odalric-Ambrym Maillard

Published: 09 May 2025, Last Modified: 15 Aug 2025RLC 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Model-based, linear quadratic regulator, exploration, minimum empirical divergence

Abstract: We revisit the problem of controlling linear systems with quadratic cost under unknown dynamics with model-based reinforcement learning. Traditional methods like Optimism in the Face of Uncertainty and Thompson Sampling, rooted in multi-armed bandits (MABs), face practical limitations. In contrast, we propose an alternative based on the *Confusing Instance* (CI) principle, which underpins regret lower bounds in MABs and discrete Markov Decision Processes (MDPs) and is central to the *Minimum Empirical Divergence* (MED) family of algorithms, known for their asymptotic optimality in various settings. By leveraging the structure of LQR policies along with sensitivity and stability analysis, we develop `MED-LQ`. This novel control strategy extends the principles of CI and MED beyond small-scale settings. Our benchmarks on a comprehensive control suite demonstrate that `MED-LQ` achieves competitive performance in various scenarios while highlighting its potential for broader applications in large-scale MDPs.

Submission Number: 89

Loading