No-Regret Linear Bandits beyond Realizability

Chong Liu; Ming Yin; Yu-Xiang Wang

No-Regret Linear Bandits beyond Realizability

Chong Liu, Ming Yin, Yu-Xiang Wang

Published: 08 May 2023, Last Modified: 26 Jun 2023UAI 2023Readers: Everyone

Keywords: linear bandit, misspecified bandit, optimization, no-regret algorithm

TL;DR: A no-regret algorithm solves linear bandit problem without realizability.

Abstract: We study linear bandits when the underlying reward function is not linear. Existing work relies on a uniform misspecification parameter $\epsilon$ that measures the sup-norm error of the best linear approximation. This results in an unavoidable linear regret whenever $\epsilon > 0$. We describe a more natural model of misspecification which only requires the approximation error at each input $x$ to be proportional to the suboptimality gap at $x$. It captures the intuition that, for optimization problems, near-optimal regions should matter more and we can tolerate larger approximation errors in suboptimal regions. Quite surprisingly, we show that the classical LinUCB algorithm --- designed for the realizable case --- is automatically robust against such gap-adjusted misspecification. It achieves a near-optimal $\sqrt{T}$ regret for problems that the best-known regret is almost linear in time horizon $T$. Technically, our proof relies on a novel self-bounding argument that bounds the part of the regret due to misspecification by the regret itself.

Supplementary Material: pdf

Other Supplementary Material: zip

0 Replies

Loading