Abstract: In this paper, we propose a stochastic linear contextual bandit algorithm that ensures local differential privacy (LDP). Our algorithm is $(\epsilon,\delta)-$Locally Differentially Private and guarantees $\tilde O\left(\sqrt{d}T^{3/4}\right)$ regret with high probability . This is a factor of $d^{1/4}$ improvement over the previous state-of-the-art (SOTA)\citep{zheng2020locally}. Furthermore, our regret guarantee improves to $\tilde O\left(\sqrt{dT}\right)$ when the action space is well-conditioned. This rate matches the optimal non-private asymptotic rate, thus demonstrating that we can achieve privacy for free even in the stringent LDP model. Our algorithm is the first algorithm that achieves $\tilde O(\sqrt{T})$ regret in a privacy setting that is stronger than the central settings.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Lijun_Zhang1
Submission Number: 2495
Loading