Parameter-Free Dynamic Regret for Unconstrained Linear Bandits

Alberto Rumi; Andrew Jacobsen; Nicolò Cesa-Bianchi; Fabio Vitale

Parameter-Free Dynamic Regret for Unconstrained Linear Bandits

Alberto Rumi, Andrew Jacobsen, Nicolò Cesa-Bianchi, Fabio Vitale

Published: 17 Jul 2025, Last Modified: 07 Oct 2025EWRL 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-armed Bandits, Online learning, Parameter Free

TL;DR: We propose the first unconstrained linear bandit algorithm to achieve optimal dynamic regret without any prior knowledge of the non-stationarity.

Abstract: We study dynamic regret minimization in online learning with an oblivious adversary and bandit feedback. In this setting, a learner must minimize the cumulative loss relative to an arbitrary sequence of comparators $u_1,\ldots,u_T$ in $\mathcal{W}\subseteq\mathbb{R}^d$, but receives only point-evaluation feedback on each round. We provide a simple approach to combining the guarantees of several bandit algorithms, allowing us to design algorithms which optimally adapt to the path-length $P_T=\sum_t |u_t-u_{t-1}|$ or the number of switches $S_T = \sum_t\mathbb{I} [u_t \neq u_{t-1} ]$ of an arbitrary comparator sequence. In particular, we provide the first algorithms for linear bandits which obtain the optimal regret guarantee of order $\mathcal{O}\big(\sqrt{(1+S_T) T}\big)$ up to poly-logarithmic terms without prior knowledge of $S_T$, resolving a long-standing open problem.

Confirmation: I understand that authors of each paper submitted to EWRL may be asked to review 2-3 other submissions to EWRL.

Serve As Reviewer: ~Alberto_Rumi1

Track: Regular Track: unpublished work

Submission Number: 114

Loading