Abstract: We study the prediction with expert advice problem, where in each round, the player selects one of N actions and incurs the corresponding loss according to an N-dimensional linear loss vector, and aim to minimize the regret. In this paper, we consider a new measure of the loss functions, which we call L ∞ -variation. Consider the loss functions with small L ∞ -variation, if the player is allowed to have some information related to the variation in each round, we can obtain an online bandit algorithm for the problem without using the self-concordance methodology, which conditionally answers an open problem in [8]. Another related problem is the combinatorial prediction game, in which the set of actions is a subset of {0,1}d, and the loss function is in [–1,1]d. We provide an online algorithm in the semi-bandit setting when the loss functions have small L ∞ -variation.
External IDs:dblp:conf/cocoon/LeeTY14
Loading