Abstract: We consider a variant of the standard Bandit linear optimization, where in each trial the loss
function is the sum of a linear function and a small but arbitrary perturbation chosen after
observing the player’s choice. We give both expected and high probability regret bounds
for the problem. Our result also implies an improved high-probability regret bound for the
Bandit linear optimization, a special case with no perturbation. We also give a lower bound
on the expected regret.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: Revisions have been made based on the feedback from all three reviewers, particularly addressing areas that were unclear in the previous version. Specifically, the sections highlighted in red in the newly submitted paper include annotations indicating which reviewer's comment each change addresses.
Assigned Action Editor: ~Gergely_Neu1
Submission Number: 3681
Loading