Minimum Empirical Divergence for Sub-Gaussian Linear Bandits

Kapilan Balagopalan; Kwang-Sung Jun

Minimum Empirical Divergence for Sub-Gaussian Linear Bandits

Kapilan Balagopalan, Kwang-Sung Jun

Published: 22 Jan 2025, Last Modified: 03 Oct 2025AISTATS 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: New algorithm for Linear bandit with state-of-the art regret bound and competitive empirical performance

Abstract: We propose a novel linear bandit algorithm called LinMED (Linear Minimum Empirical Divergence), which is a linear extension of the MED algorithm that was originally designed for multi-armed bandits. LinMED is a randomized algorithm that admits a closed-form computation of the arm sampling probabilities, unlike the popular randomized algorithm called linear Thompson sampling. Such a feature proves useful for off-policy evaluation where the unbiased evaluation requires accurately computing the sampling probability. We prove that LinMED enjoys a near-optimal regret bound of $d\sqrt{n}$ up to logarithmic factors where $d$ is the dimension and $n$ is the time horizon. We further show that LinMED enjoys a $\frac{d^2}{\Delta}\left(\log^2(n)\right)\log\left(\log(n)\right)$ problem-dependent regret where $\Delta$ is the smallest suboptimality gap. Our empirical study shows that LinMED has a competitive performance with the state-of-the-art algorithms.

Full Paper: https://proceedings.mlr.press/v258/balagopalan25a.html

Submission Number: 542

Loading