Abstract: We derive upper and lower bounds for the policy regret of T-round online learning problems with graph-structured feedback, where the adversary is nonoblivious but assumed to have a bounded memory. We obtain upper bounds of O~(T <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2/3</sup> ) and Õ(T <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">3/4</sup> ) for strongly-observable and weakly-observable graphs, respectively, based on analyzing a variant of the Exp3 algorithm. When the adversary is allowed a bounded memory of size 1, we show that a matching lower bound of Ω̃(T <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2/3</sup> ) is achieved in the case of full-information feedback. We also study the particular loss structure of an oblivious adversary with switching costs, and show that in such a setting, non-revealing strongly-observable feedback graphs achieve a lower bound of Ω̃(T <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2/3</sup> ), as well.
0 Replies
Loading