Delayed Adversarial Attacks on Stochastic Bandits

Delayed Adversarial Attacks on Stochastic Bandits

ICLR 2026 Conference Submission13813 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: online learning, adversarial attacks, multi-armed bandits

Abstract: We study adversarial attacks on stochastic bandits when, differently from previous works, the attack of the malicious attacker is delayed and starts after the learning process begins. We focus on strong attacks and capture the setting in which the malicious attacker lacks information about the beginning of the learning process, a limitation that can dramatically affect the effectiveness of the attack. We introduce a more general framework to study adversarial attacks on stochastic bandit algorithms, providing new definitions of success and profitability of an attack, that account for variable corruption start time. We then analyze success and profitability for different families of algorithms such as UCB and $\epsilon$-greedy against an omniscient attacker. In particular, we derive upper and lower bounds on the number of target arm pulls, showing that our bounds are tight up to a sublinear factor. Finally, we identify an intuitive condition that characterizes when an attack can succeed as a function of its starting time and evaluate the tightness of our theoretical bounds on synthetic instances.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 13813

Loading