Adversarial Combinatorial Bandits with Switching Cost and Arm Selection Constraints

Published: 01 Jan 2024, Last Modified: 10 Feb 2025INFOCOM 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The multi-armed bandits (MAB) framework is widely used for sequential decision-making under uncertainty, finding applications in various domains, including computer and communication networks. To address the increasing complexity of real-world systems and their operational requirements, researchers have proposed and studied various extensions to the basic MAB framework. In this paper, we focus on an adversarial MAB problem inspired by real-world systems with combinatorial semi-bandit arms, switching costs, and anytime cumulative arm selection constraints. To tackle this challenging problem, we introduce the Block-structured Follow-the-Regularized-Leader (B-FTRL) algorithm. Our approach employs a hybrid Tsallis-Shannon entropy regularizer in arm selection and incorporates a block structure that divides time into blocks to minimize arm switching costs. The theoretical analysis shows that B-FTRL achieves a reward regret bound of $O\left( {{T^{\frac{{2a - b + 1}}{{1 + a}}}} + {T^{\frac{b}{{1 + a}}}}} \right)$ and a switching regret bound of $O\left( {{T^{\frac{1}{{1 + a}}}}} \right)$, where a and b are tunable algorithm parameters. By carefully selecting the values of a and b, we are able to limit the total regret to O(T 2/3 ) while satisfying the arm selection constraints in expectation. This outperforms the state-of-the-art regret bound of O(T 3/4 ) and expected constraint violation bound o(1), which are derived in less challenging stochastic reward environments. Additionally, we provide a high-probability constraint violation bound of $O(\sqrt T )$. To validate the effectiveness of the proposed BFTRL algorithm, numerical results are presented to demonstrate its superiority in comparison to other existing methods.
Loading