Regret-Optimal List Replicable Bandit Learning: Matching Upper and Lower Bounds

Michael Chen; A. Pavan; N. V. Vinodchandran; Ruosong Wang; Lin Yang

Regret-Optimal List Replicable Bandit Learning: Matching Upper and Lower Bounds

Michael Chen, A. Pavan, N. V. Vinodchandran, Ruosong Wang, Lin Yang

Published: 22 Jan 2025, Last Modified: 31 Mar 2025ICLR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Replicability, Regret Bound, Bandit

Abstract: This paper investigates *list replicability* [Dixon et al., 2023] in the context of multi-armed (also linear) bandits (MAB). We define an algorithm $A$ for MAB to be $(\ell,\delta)$-list replicable if with probability at least $1-\delta$, $A$ has at most $\ell$ traces in independent executions even with different random bits, where a trace means sequence of arms played during an execution. For $k$-armed bandits, although the total number of traces can be $\Omega(k^T)$ for a time horizon $T$, we present several surprising upper bounds that either independent of or logarithmic of $T$: (1) a $(2^{k},\delta)$-list replicable algorithm with near-optimal regret, $\widetilde{O}({\sqrt{kT}})$, (2) a $(O(k/\delta),\delta)$-list replicable algorithm with regret $\widetilde{O}\left(\frac{k}{\delta}\sqrt{kT}\right)$, (3) a $((k+1)^{B-1}, \delta)$-list replicable algorithm with regret $\widetilde{O}(k^{\frac{3}{2}}T^{{\frac{1}{2}}+2^{-(B+1)}})$ for any integer $B>1$. On the other hand, for the sublinear regret regime, we establish a matching lower bound on the list complexity (parameter $\ell$). We prove that there is no $(k-1,\delta)$-list replicable algorithm with $o(T)$-regret. This is optimal in list complexity in the sub-linear regret regime as there is a $(k, 0)$-list replicable algorithm with $O(T^{2/3})$-regret. We further show that for linear bandits with $d$-dimensional features, there is a $\widetilde{O}(d^2T^{1/2+2^{-(B+1)}})$-regret algorithm with $((2d+1)^{B-1},\delta)$-list replicability, for $B>1$, even when the number of possible arms can be infinite.

Primary Area: learning theory

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 10553

Loading