Global Reward Maximization with Partial Feedback via Differentially Private Distributed Linear Bandits
Abstract: In this paper, we study the problem of global reward maximization with only partial distributed feedback. This problem is motivated by several real-world applications (e.g., cellular network configuration, dynamic pricing, and policy selection) where an action was taken by a central entity that influences a large population that contributes to the global reward. However, collecting such reward feedback from the entire population not only incurs a prohibitively high cost but often leads to privacy concerns. To tackle this, we formulate it as a differentially private distributed linear bandits (DP-DLB), where only a subset of users from the population are selected (called clients) to participate in the learning process and the central server learns the global model from such partial feedback by iteratively aggregating these clients' local feedback in a differentially private fashion. We then propose a unified algorithmic learning framework, called differentially private distributed phased elimination (DP-DPE), which enables us to naturally integrate popular differential privacy (DP) models (including central DP, local DP, and shuffle DP) into the learning process. Furthermore, we analyze the performance of the DP-DPE algorithm and show that DP-DPE achieves both sublinear regret and sublinear communication cost. Interestingly, we highlight that DP-DPE allows us to achieve privacy protection ``for free'' as the additional cost due to privacy can be a lower-order additive term. Finally, we conduct simulations to corroborate our theoretical results and demonstrate the effectiveness of DP-DPE in terms of regret, communication cost, and privacy guarantees.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Branislav_Kveton1
Submission Number: 28
Loading