Keywords: Online Learning, Multi-armed Bandits, Combinatorial Bandits, Sleeping Arms, Semi-bandit Feedback, Thompson Sampling, Gaussian Priors
Abstract: This paper provides theoretical analyses of worst-case regret upper and lower bounds for Gaussian randomized algorithms in semi-bandits with sleeping arms, In this setting, base arms may be unavailable in certain rounds, and only available base arms satisfying combinatorial constraints can be played simultaneously. We first introduce CTS-G, a randomized algorithm that achieves a $\tilde{O}(m\sqrt{NT})$ regret upper bound, where $T$ is the number of rounds, $N$ is the number of base arms, and up to $m$ base arms can be played per round.
Next, we present CL-SG, a randomized algorithm that achieves a $\tilde{O}(\sqrt{mNT})$ regret bound. In addition to regret upper bounds, we also establish lower bounds showing that both of our proposed algorithms are near-optimal.
Submission Number: 76
Loading