Abstract: Combinatorial multi-armed bandit (CMAB) is a fundamental framework widely used in networked systems to maximize cumulative rewards under uncertainty. Real-world applications such as federated learning and content delivery network often involve feedback that may be corrupted due to adversarial attacks or network disruptions. In this paper, we study contextual CMAB ($\mathrm{C}^{2}$ MAB) with adversarial corruptions, where feedback for base arms within any selected super arms can be corrupted by an adversary. We focus on $L_{1}$ -norm smooth reward function and both $L_{1}$ and $L_{\infty}$ -norm corruption measures, establishing tight regret upper bounds for each scenario. Additionally, we provide the first lower bounds for $\mathrm{C}^{2}$ MAB under corruptions, confirming the optimality of our proposed algorithm. To broaden the applicability, we further extend our algorithm to a more general $\mathrm{C}^{2}$ - MAB setting with probabilistically triggered arms. Empirical validation demonstrates significant improvements across synthetic and real-world datasets, with applications in contextual latency-critic federated learning, user-specific online content delivery and 360° VR video streaming.
External IDs:dblp:conf/infocom/Wang0Z025
Loading