Connecting Thompson Sampling and UCB: Towards More Efficient Trade-offs Between Privacy and Regret

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: We address differentially private stochastic bandit problems by leveraging Thompson Sampling with Gaussian priors and Gaussian differential privacy (GDP). We propose DP-TS-UCB, a novel parametrized private algorithm that enables trading off privacy and regret. DP-TS-UCB satisfies $ \tilde{O} \left(T^{0.25(1-\alpha)}\right)$-GDP and achieves $O \left(K\ln^{\alpha+1}(T)/\Delta \right)$ regret bounds, where $K$ is the number of arms, $ \Delta$ is the sub-optimality gap, $T$ is the learning horizon, and $\alpha \in [0,1]$ controls the trade-off between privacy and regret. Theoretically, DP-TS-UCB relies on anti-concentration bounds for the Gaussian distributions, linking the exploration mechanisms of Thompson Sampling and Upper Confidence Bound, which may be of independent research interest.
Lay Summary: Since machine learning algorithms without consideration of differential privacy constantly expose private information, this paper works on developing efficient differentially private sequential learning algorithms under bandit feedback model. It is centered around understanding how to trade off privacy and regret, and finding the fundamental limits that prevent private sequential learning algorithms from having a good trade-off.
Primary Area: Theory->Online Learning and Bandits
Keywords: bandits, Thompson Sampling, privacy
Submission Number: 12729
Loading