Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithms

Published: 27 Jan 2024, Last Modified: 27 Jan 2024Accepted by TMLREveryoneRevisionsBibTeX
Abstract: We study the corrupted bandit problem, i.e. a stochastic multi-armed bandit problem with $k$ unknown reward distributions, which are heavy-tailed and corrupted by a history-independent adversary or Nature. To be specific, the reward obtained by playing an arm comes from corresponding heavy-tailed reward distribution with probability $1-\varepsilon \in (0.5,1]$ and an arbitrary corruption distribution of unbounded support with probability $\varepsilon \in [0,0.5)$. First, we provide \textit{a problem-dependent lower bound on the regret} of any corrupted bandit algorithm. The lower bounds indicate that the corrupted bandit problem is harder than the classical stochastic bandit problem with subGaussian or heavy-tail rewards. Following that, we propose a novel UCB-type algorithm for corrupted bandits, namely \texttt{HubUCB}, that builds on Huber's estimator for robust mean estimation. Leveraging a novel concentration inequality of Huber's estimator, we prove that \texttt{HubUCB} achieves a near-optimal regret upper bound. Since computing Huber's estimator has quadratic complexity, we further introduce a sequential version of Huber's estimator that exhibits linear complexity. We leverage this sequential estimator to design \texttt{SeqHubUCB} that enjoys similar regret guarantees while reducing the computational burden. Finally, we experimentally illustrate the efficiency of \texttt{HubUCB} and \texttt{SeqHubUCB} in solving corrupted bandits for different reward distributions and different levels of corruptions.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: We removed the motivation example and corrected typos.
Supplementary Material: pdf
Assigned Action Editor: ~Andras_Gyorgy1
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Number: 530