Adversarial bandit optimization for approximately linear functions

13 Nov 2024 (modified: 02 Mar 2025)
Abstract: We consider a variant of the standard Bandit linear optimization, where in each trial the loss function is the sum of a linear function and a small but arbitrary perturbation chosen after observing the player’s choice. We give both expected and high probability regret bounds for the problem. Our result also implies an improved high-probability regret bound for the Bandit linear optimization, a special case with no perturbation. We also give a lower bound on the expected regret.
