Instance-Optimal Best-Arm Identification in Non-Stationary Linear Bandits

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: fixed-budget best-arm identification, non-stationary linear bandits
TL;DR: We prove an instance-dependent lower bound for the best-arm identification in non-stationary linear bandits and propose an algorithm that nearly achieves it.
Abstract: We investigate the fixed-budget best-arm identification (BAI) problem in non-stationary linear bandits. Concretely, given a fixed budget $T\in \mathbb{N}$, finite arm set $\mathcal{X} \in \mathbb{R}^d$, and a potentially adversarial sequence of unknown parameters $\lbrace \theta_t\rbrace_{t=1}^{T}$ (hence non-stationary), a learner aims to identify the arm with the largest average reward $x_* = \arg\max_{x \in \mathcal{X}} x^\top\sum_{t=1}^T \theta_t$ with high probability. It is well-known that uniformly sampling arms from the G-optimal design yields a minimax-optimal error probability of order $\exp\left(-T\Delta_{(1)}^2 / d \right)$, where $\Delta_{(1)}$ denotes the average reward gap between the first and second best arms. However, this can be suboptimal in certain arm sets as it only aims to minimize the worst-case variance of each arm's estimated reward. To emphasize this, we establish an arm-set-dependent lower bound and show that uniformly sampling from the G-optimal design fails to achieve it. Motivated by this gap, we propose the *Adjacent-optimal design*, a specialization of the $\mathcal{XY}$-optimal design tailored to the non-stationary setting, and develop the **Adjacent-BAI** algorithm. We prove that the error probability of **Adjacent-BAI** matches our lower bound, establishing the optimality of **Adjacent-BAI**, and highlighting the gap between the G-optimal and Adjacent-optimal designs.
Primary Area: learning theory
Submission Number: 13867
Loading