Adversarial bandit optimization for approximately linear functions

TMLR Paper3681 Authors

13 Nov 2024 (modified: 02 Mar 2025)Decision pending for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: We consider a variant of the standard Bandit linear optimization, where in each trial the loss function is the sum of a linear function and a small but arbitrary perturbation chosen after observing the player’s choice. We give both expected and high probability regret bounds for the problem. Our result also implies an improved high-probability regret bound for the Bandit linear optimization, a special case with no perturbation. We also give a lower bound on the expected regret.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: Revisions have been made based on the feedback from all three reviewers, particularly addressing areas that were unclear in the previous version. Specifically, the sections highlighted in red in the newly submitted paper include annotations indicating which reviewer's comment each change addresses.
Assigned Action Editor: ~Gergely_Neu1
Submission Number: 3681
Loading