Neural Constrained Combinatorial BanditsDownload PDFOpen Website

Published: 2023, Last Modified: 21 Feb 2024INFOCOM 2023Readers: Everyone
Abstract: Constrained combinatorial contextual bandits have emerged as trending tools in intelligent systems and networks to model reward and cost signals under combinatorial decision-making. On one hand, both signals are complex functions of the context, e.g., in federated learning, training loss (negative reward) and energy consumption (cost) are nonlinear functions of edge devices’ system conditions (context). On the other hand, there are cumulative constraints on costs, e.g., the accumulated energy consumption should be budgeted by energy resources. Besides, real-time systems often require such constraints to be guaranteed anytime or in each round, e.g., ensuring anytime fairness for task assignment to maintain the credibility of crowdsourcing platforms for workers. This setting imposes a challenge on how to simultaneously achieve reward maximization while subjecting to anytime cumulative constraints. To address such challenge, we propose a primal-dual algorithm (Neural-PD) whose primal component adopts multi-layer perceptrons to estimate reward and cost functions, and its dual component estimates the Lagrange multiplier with the virtual queue. By integrating neural tangent kernel theory and Lyapunov-drift techniques, we prove Neural-PD achieves a sharp regret bound and a zero constraint violation. We also show Neural-PD outperforms existing algorithms with extensive experiments on both synthetic and real-world datasets.
0 Replies

Loading