Attacking Combinatorial Bandits: Beyond Bounded Rewards

Attacking Combinatorial Bandits: Beyond Bounded Rewards

ICLR 2026 Conference Submission14433 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: online learning, multi-armed bandits, adversarial attacks

Abstract: Combinatorial Multi-Armed Bandits (CMABs) are a widely adopted tool to address online learning problems with a combinatorial nature. Adversarial attacks, on the other hand, represent a significant threat to machine learning algorithms, where a malicious entity intentionally manipulates data or feedback to deceive learning algorithms, undermining their performance and reliability. While CMABs and adversarial machine learning received extensive attention as distinct subjects, CMABs under adversarial attacks are still underinvestigated. We propose algorithms to attack CMABs, providing theoretical guarantees regarding success and cost in three different scenarios. Each scenario differs in the assumptions on the rewards. First, we study attacks when rewards are bounded and means are positive. Then, we consider two extensions in which rewards have unbounded support, distinguishing between positive and arbitrary means. For each scenario, we design two attack strategies. First, we assume that the attacker is omniscient, i.e., knows the problem instance, then we extend the attack to a more realistic setting where the learner and the attacker have the same knowledge of the problem. We show that our attack strategies are successful, i.e., the learner will select a target superarm for $T - o(T)$ times, except for some degenerate cases. We also show that in most settings the attack cost is sublinear in $T$. Finally, we validate our theoretical results via numerical experiments on synthetic instances.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 14433

Loading