Adversarial Post-Action Attacks on Dueling Bandits

Published: 26 Mar 2026, Last Modified: 07 May 2026OpenReview Archive Direct UploadEveryoneCC BY 4.0
Abstract: Dueling bandit algorithms excel in learning from pairwise comparisons, offering robust performance guarantees in benign environments. However, recent evidence suggests that even state-of-the-art methods can be highly susceptible to adversarial manipulation. In this work, we introduce and analyze a post-action attack model on the Relative Upper Confidence Bound (RUCB) algorithm, a widely used dueling bandit algorithm. Unlike pre-action attack considered in the existing work where the attacker can observe all comparisons beforehand, our post-action adversary intercepts only the feedback from the specific arm pair chosen by the learner at each round. Despite this limited access, we show that such targeted interference can coerce the learner into favoring a predetermined target arm for almost the entire time horizon. Specifically, the attacker incurs a total cost of only $\mathcal{O}(K \ln T)$ while ensuring that the learner pulls the target arm in T - $\mathcal{O}(K^2 \ln T)$ comparisons. These findings underscore the vulnerability of dueling bandit algorithms to post-action adversarial interference and highlight the need for more robust dueling bandits strategies.
Loading