Abstract: This paper addresses the stochastic multi-
armed bandit problem with an undirected feed-
back graph. We devise a UCB-based al-
gorithm, UCB-NE, to provide a problem-
dependent regret bound that depends on a
clique covering. Our algorithm obtains re-
gret which provably scales linearly with the
clique covering number. Additionally, we pro-
vide problem-dependent regret bounds for a
Thompson Sampling-based algorithm, TS-N,
where again the bounds are linear in the clique
covering number. Finally, we present experi-
mental results to see how UCB-NE, TS-N, and
a few related algorithms perform practically.
0 Replies
Loading