Abstract: We consider a variant of the standard Bandit linear optimization, where in each trial the loss
function is the sum of a linear function and a small but arbitrary perturbation chosen after
observing the player’s choice. We give both expected and high probability regret bounds
for the problem. Our result also implies an improved high-probability regret bound for the
Bandit linear optimization, a special case with no perturbation. We also give a lower bound
on the expected regret.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Gergely_Neu1
Submission Number: 3681
Loading