Abstract: We consider a large number of agents collaborating on a multi-armed bandit problem with a large number of arms. We present an algorithm which improves upon the Gossip- Insert-Eliminate method of Chawla et al. [3]. We provide a regret bound which shows that our algorithm is asymptotically optimal and present empirical results demonstrating lower regret on simulated data.
0 Replies
Loading