Regret-Optimal List Replicable Bandit Learning: Matching Upper and Lower Bounds

ICLR 2025 Conference Submission10553 Authors

27 Sept 2024 (modified: 26 Nov 2024)ICLR 2025 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Replicability, Regret Bound, Bandit
Abstract: This paper investigates *list replicability* [Dixon et al., 2023] in the context of multi-armed (also linear) bandits (MAB). We define an algorithm $A$ for MAB to be $(\ell,\delta)$-list replicable if with probability at least $1-\delta$, $A$ has at most $\ell$ traces in independent executions even with different random bits, where a trace means sequence of arms played during an execution. For $k$-armed bandits, although the total number of traces can be $\Omega(k^T)$ for a time horizon $T$, we present several surprising upper bounds that either independent of or logarithmic of $T$: (1) a $(2^{k},\delta)$-list replicable algorithm with near-optimal regret, $\widetilde{O}({\sqrt{kT}})$, (2) a $(O(k/\delta),\delta)$-list replicable algorithm with regret $\widetilde{O}\left(\frac{k}{\delta}\sqrt{kT}\right)$, (3) a $((k+1)^{B-1}, \delta)$-list replicable algorithm with regret $\widetilde{O}(k^{\frac{3}{2}}T^{{\frac{1}{2}}+2^{-(B+1)}})$ for any integer $B>1$. On the other hand, for the sublinear regret regime, we establish a matching lowerbound on the list complexity (parameter $\ell$). We prove that there is no $(k-1,\delta)$-list replicable algorithm with $o(T)$-regret. This is optimal in list complexity in the sub-linear regret regime as there is a $(k, 0)$-list replicable algorithm with $O(T^{2/3})$-regret.
Primary Area: learning theory
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 10553
Loading