Regret Analysis of Hybrid Linear Bandits with Biased Offline Data

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi-armed bandits, Linear multi-armed bandits, Offline-to-online, Hybrid learning
Abstract: Linear bandits have been extensively studied due to their broad applications and solid theoretical foundations. However, purely online algorithms often suffer from high exploration costs, while purely offline approaches critically depend on the quality of offline data. To bridge this gap, we study a hybrid setting where (biased) offline data is available in online learning. We propose Hybrid LinUCB, an algorithm that leverages both offline and online information by constructing two confidence ellipsoids to trade off bias against the size of offline data. We establish an upper bound and a nearly matching lower bound that explicitly capture the dependence on the bias upper bound $V$ and the spectrum of the offline feature matrix $V_{0}$. Compared with existing work, our algorithm requires weaker assumptions on offline data and exhibits stronger adaptability. Moreover, our theoretical analysis recovers and unifies prior guarantees across different settings.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 6971
Loading