Universal Algorithm for Extreme Bandits with the Minimal Complexities

24 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: extreme bandits, online optimization, heavy-tails, non-iid data, non-parametric
Abstract: The Multi-Armed Bandit is a classic reinforcement learning problem that exemplifies the exploration–exploitation trade-off dilemma. When extreme values rather than expected values are of interest, the Extreme Bandit is introduced. The motivation for this work comes from black-box optimization problems and meta learning, where the goal is to find the best value for a target function from different search spaces or using multiple search heuristics. Previous work on the extreme bandit problem has assumed that rewards are drawn from an i.i.d manner, which severely limits the applicability of this class of algorithm. In this paper, with minimal temporal and spatial cost and minimal assumptions about the reward distribution, we present an novel algorithm and provide its analysis. Numerical experiments highlight the performance of the proposed algorithm to the existing approaches.
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9286
Loading