LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits

Masahiro Kato; Shinji Ito

LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits

Masahiro Kato, Shinji Ito

Published: 22 Jan 2025, Last Modified: 11 Mar 2025AISTATS 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: We investigate the \emph{linear contextual bandit problem} with independent and identically distributed (i.i.d.) contexts. In this problem, we aim to develop a \emph{Best-of-Both-Worlds} (BoBW) algorithm with regret upper bounds in both stochastic and adversarial regimes. We develop an algorithm based on \emph{Follow-The-Regularized-Leader} (FTRL) with Tsallis entropy, referred to as the $\alpha$-\emph{Linear-Contextual (LC)-Tsallis-INF}. We show that its regret is at most $O(\log(T))$ in the stochastic regime under the assumption that the suboptimality gap is uniformly bounded from below, and at most $O(\sqrt{T})$ in the adversarial regime. Furthermore, our regret analysis is extended to more general regimes characterized by the \emph{margin condition} with a parameter $\beta \in (1, \infty]$, which imposes a milder assumption on the suboptimality gap than in previous studies. We show that the proposed algorithm achieves $O\left(\log(T)^{\frac{1+\beta}{2+\beta}}T^{\frac{1}{2+\beta}}\right)$ regret under the margin condition.

Submission Number: 1366

Loading