Handling Heterogeneous Curvatures in Bandit LQR Control

Published: 02 May 2024, Last Modified: 25 Jun 2024ICML 2024 SpotlightEveryoneRevisionsBibTeXCC BY 4.0
Abstract: We investigate online Linear Quadratic Regulator (LQR) with bandit feedback and semi-adversarial disturbances. Previous works assume costs with *homogeneous* curvatures (i.e., with a uniform strong convexity lower bound), which can be hard to satisfy in many real scenarios and prohibits adapting to true curvatures for better performance. In this paper, we initiate the study of bandit LQR control with *heterogeneous* cost curvatures, aiming to strengthen the algorithm's adaptivity. To achieve this, we reduce the problem to bandit convex optimization with memory via a ``with-history'' reduction to avoid hard-to-control truncation errors. Then we provide a novel analysis for an important *stability* term that appeared in both regret and memory, using *Newton decrement* developed in interior-point methods. The analysis enables us to guarantee memory-related terms introduced in the reduction and also provide a simplified analysis for handling heterogeneous curvatures in bandit convex optimization. Finally, we achieve interpolated guarantees that can not only recover existing bounds for convex and quadratic costs but also attain new implications for cases of corrupted and decaying quadraticity.
Submission Number: 4916
Loading