Parameter-Free Dynamic Regret for Unconstrained Linear Bandits

Published: 17 Jul 2025, Last Modified: 06 Sept 2025EWRL 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi-armed Bandits, Online learning, Parameter Free
TL;DR: We propose the first unconstrained linear bandit algorithm to achieve optimal dynamic regret without any prior knowledge of the non-stationarity.
Abstract: We study dynamic regret minimization in online learning with an oblivious adversary and bandit feedback. In this setting, a learner must minimize the cumulative loss relative to an arbitrary sequence of comparators $u_1,\ldots,u_T$ in $\mathcal{W}\subseteq\mathbb{R}^d$, but receives only point-evaluation feedback on each round. We provide a simple approach to combining the guarantees of several bandit algorithms, allowing us to design algorithms which optimally adapt to the path-length $P_T=\sum_t |u_t-u_{t-1}|$ or the number of switches $S_T = \sum_t\mathbb{I} [u_t \neq u_{t-1} ]$ of an arbitrary comparator sequence. In particular, we provide the first algorithms for linear bandits which obtain the optimal regret guarantee of order $\mathcal{O}\big(\sqrt{(1+S_T) T}\big)$ up to poly-logarithmic terms without prior knowledge of $S_T$, resolving a long-standing open problem.
Confirmation: I understand that authors of each paper submitted to EWRL may be asked to review 2-3 other submissions to EWRL.
Serve As Reviewer: ~Alberto_Rumi1
Track: Regular Track: unpublished work
Submission Number: 114
Loading