Gaussian Randomized Exploration for Semi-bandits with Sleeping Arms

ZHIMING HUANG; Bingshan Hu; jianping pan

Gaussian Randomized Exploration for Semi-bandits with Sleeping Arms

ZHIMING HUANG, Bingshan Hu, jianping pan

Published: 10 Oct 2024, Last Modified: 05 Dec 2024NeurIPS BDU Workshop 2024 PosterEveryoneRevisionsBibTeXCC BY-NC-ND 4.0

Keywords: Online Learning, Multi-armed Bandits, Combinatorial Bandits, Sleeping Arms, Semi-bandit Feedback, Thompson Sampling, Gaussian Priors

Abstract: This paper provides theoretical analyses of worst-case regret upper and lower bounds for Gaussian randomized algorithms in semi-bandits with sleeping arms, In this setting, base arms may be unavailable in certain rounds, and only available base arms satisfying combinatorial constraints can be played simultaneously. We first introduce CTS-G, a randomized algorithm that achieves a $\tilde{O}(m\sqrt{NT})$ regret upper bound, where $T$ is the number of rounds, $N$ is the number of base arms, and up to $m$ base arms can be played per round. Next, we present CL-SG, a randomized algorithm that achieves a $\tilde{O}(\sqrt{mNT})$ regret bound. In addition to regret upper bounds, we also establish lower bounds showing that both of our proposed algorithms are near-optimal.

Submission Number: 76

Loading