Follow-the-Perturbed-Leader for Adversarial Bandits: Heavy Tails, Robustness, and Privacy

Published: 16 Jan 2024, Last Modified: 14 Apr 2024ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Adversarial Bandits, Online Learning, Heavy Tails, Follow-the-Perturbed-Leader
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: We study adversarial bandit problems with potentially heavy-tailed losses. Unlike standard settings with non-negative and bounded losses, managing negative and unbounded losses introduces a unique challenge in controlling the ``stability'' of the algorithm and hence the regret. To tackle this challenge, we propose a Follow-the-Perturbed-Leader (FTPL) based learning algorithm. Notably, our method achieves (nearly) optimal worst-case regret, eliminating the need for an undesired assumption inherent in the Follow-the-Regularized-Leader (FTRL) based approach. Thanks to this distinctive advantage, our algorithmic framework finds novel applications in two important scenarios with unbounded heavy-tailed losses. For adversarial bandits with heavy-tailed losses and Huber contamination, which we call the robust setting, our algorithm is the first to match the lower bound (up to a $\polylog(K)$ factor, where $K$ is the number of actions). In the private setting, where true losses are in a bounded range (e.g., $[0,1]$) but with additional Local Differential Privacy (LDP) guarantees, our algorithm achieves an improvement of a $\polylog(T)$ factor in the regret bound compared to the best-known results, where $T$ is the total number of rounds. Furthermore, when compared to state-of-the-art FTRL-based algorithms, our FTPL-based algorithm has a more streamlined design. It eliminates the need for additional explicit exploration and solely maintains the absolute value of loss estimates below a predetermined threshold.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7706
Loading