Efficient Algorithms for Lipschitz Bandits

14 May 2024 (modified: 06 Nov 2024)Submitted to NeurIPS 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: bandits
Abstract: Lipschitz bandits is a fundamental framework used to model sequential decision-making problems with large, structured action spaces. This framework has been applied in various areas. Previous algorithms, such as the Zooming algorithm, achieve near-optimal regret with $O(T^2)$ time complexity and $O(T)$ arms stored in memory, where $T$ denotes the size of the time horizons. However, in practical scenarios, learners may face limitations regarding the storage of a large number of arms in memory. In this paper, we explore the bounded memory stochastic Lipschitz bandits problem, where the algorithm is limited to storing only a limited number of arms at any given time horizon. We propose algorithms that achieve near-optimal regret with $O(T)$ time complexity and $O(1)$ arms stored, both of which are almost optimal and state-of-the-art. Moreover, our numerical results demonstrate the efficiency of these algorithms.
Primary Area: Bandits
Submission Number: 10271
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview