Robust Stochastic Bandit algorithms to defend against Oracle attack using Sample Dropout

Jayanth Yetukuri, Yang Liu

Published: 2022, Last Modified: 12 May 2023Big Data 2022Readers: Everyone

Abstract: This study aims to investigate robust algorithms for stochastic multi-armed bandit problems with adversarially corrupted rewards. We consider a novel setup of stochastic bandits where the corruptions are sporadic and adaptive to the learner’s arm selection strategy with no upper limit on the total budget constraint. We first introduce an attacker model called Fractional Oracle Attack (FOA), and show its efficacy against the standard UCB and ε-greedy algorithms with sufficient conditions for its success under $\mathcal{O}(\log T)$ attack cost. We then present two robust algorithms Sample Dropout-UCB (SD-UCB) and Sample Dropout-ε-greedy (SD-εG) to defend against FOA. The core idea of our algorithms is to use reward dropout during sample mean estimation, therefore tolerating a significant amount of quantified corruption. Both the algorithms are significantly more robust when compared to contemporary roust algorithms, and achieves a regret at the order of $\mathcal{O}(\log T)$.

0 Replies