Improved Regret Bounds for Linear Bandits with Heavy-Tailed Rewards

Artin Tajdini; Jonathan Scarlett; Kevin Jamieson

Improved Regret Bounds for Linear Bandits with Heavy-Tailed Rewards

Artin Tajdini, Jonathan Scarlett, Kevin Jamieson

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY-NC 4.0

Keywords: linear bandits, heavy-tailed, experimental design, regret analysis, online learning

Abstract: We study stochastic linear bandits with heavy-tailed rewards, where the rewards have a finite $(1+\epsilon)$-absolute central moment bounded by $\upsilon$ for some $\epsilon \in (0,1]$. We improve both upper and lower bounds on the minimax regret compared to prior work. When $\upsilon = \mathcal{O}(1)$, the best prior known regret upper bound is $\tilde{O}(d T^{\frac{1}{1+\epsilon}})$. While a lower with the same scaling has been given, it relies on a construction using $\upsilon = d$, and adapting the construction to the bounded-moment regime with $\upsilon = \mathcal{O}(1)$ yields only a $\Omega(d^{\frac{\epsilon}{1+\epsilon}} T^{\frac{1}{1+\epsilon}})$ lower bound. This matches the known rate for multi-armed bandits and is generally loose for linear bandits, in particular being $\sqrt{d}$ below the optimal rate in the finite-variance case ($\epsilon = 1$). We propose a new elimination-based algorithm guided by experimental design, which achieves regret $\tilde{\mathcal{O}}(d^{\frac{1+3\epsilon}{2(1+\epsilon)}} T^{\frac{1}{1+\epsilon}})$, thus improving the dependence on $d$ for all $\epsilon \in (0,1)$ and recovering a known optimal result for $\epsilon = 1$. We also establish a lower bound of $\Omega(d^{\frac{2\epsilon}{1+\epsilon}} T^{\frac{1}{1+\epsilon}})$, which strictly improves upon the multi-armed bandit rate and highlights the hardness of heavy-tailed linear bandit problems. For finite action sets of size $n$, we derive upper and lower bounds of $\tilde{\mathcal{O}}(\sqrt d (\log n)^{\frac{\epsilon}{1+\epsilon}}T^{\frac{1}{1+\epsilon}})$ and $\tilde\Omega(d^{\frac{\epsilon}{1+\epsilon}}(\log n)^{\frac{\epsilon}{1+\epsilon}} T^{\frac{1}{1+\epsilon}})$, respectively. Finally, we provide action-set-dependent regret upper bounds and show that for some geometries, such as $l_p$-norm balls for $p \le 1 + \epsilon$, we can further reduce the dependence on $d$.

Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)

Submission Number: 18412

Loading