Problem-dependent regret bounds for online learning with feedback graphs

Bingshan Hu

02 Jan 2023OpenReview Archive Direct UploadReaders: Everyone

Abstract: This paper addresses the stochastic multi- armed bandit problem with an undirected feed- back graph. We devise a UCB-based al- gorithm, UCB-NE, to provide a problem- dependent regret bound that depends on a clique covering. Our algorithm obtains re- gret which provably scales linearly with the clique covering number. Additionally, we pro- vide problem-dependent regret bounds for a Thompson Sampling-based algorithm, TS-N, where again the bounds are linear in the clique covering number. Finally, we present experi- mental results to see how UCB-NE, TS-N, and a few related algorithms perform practically.

0 Replies