The Power of Feel-Good Thompson Sampling: A Unified Framework for Linear BanditsDownload PDF

Published: 01 Feb 2023, Last Modified: 13 Feb 2023Submitted to ICLR 2023Readers: Everyone
Abstract: Linear contextual bandit is one of the most popular models in online decision-making with bandit feedback. Prior work has studied different variants of this model, e.g., misspecified, non-stationary, and multi-task/life-long linear contextual bandits. However, there is no single framework that can unify the algorithm design and analysis for these variants. In this paper, we propose a unified framework for linear contextual bandits based on feel-good Thompson sampling (Zhang, 2021). The algorithm derived from our framework achieves nearly minimax optimal regret in various settings and resolves the respective open problem in each setting. Specifically, let $d$ be the dimension of the context and $T$ be the length of the horizon, our algorithm achieves an $\widetilde{\mathcal{O}}(d\sqrt{ST})$ regret bound for non-stationary linear bandits with at most $S$ switches, $\widetilde{\mathcal{O}}(d^{\frac{5}{6}} T^{\frac{2}{3}} P^{\frac{1}{3}})$ regret for non-stationary linear bandits with bounded path length $P$, and $\widetilde{\mathcal{O}}(d\sqrt{kT} + \sqrt{dkMT})$ regret for (generalized) lifelong linear bandits over $M$ tasks that share an unknown representation of dimension $k$. We believe our framework will shed light on the design and analysis of other linear contextual bandit variants.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)
Supplementary Material: zip
12 Replies