Keywords: multi-armed bandits, greedy algorithm, social learning, myopic behavior, learning failures, algorithmic game theory
TL;DR: We analyze exploration failures when myopic agents collectively face a simple multi-armed bandit problem and act in a(ny) way consistent with confidence intervals.
Abstract: We study social learning dynamics motivated by reviews on online platforms. The
agents collectively follow a simple multi-armed bandit protocol, but each agent
acts myopically, without regards to exploration. We allow a wide range of myopic
behaviors that are consistent with (parameterized) confidence intervals for the arms’
expected rewards. We derive stark exploration failures for any such behavior, and
provide matching positive results. As a special case, we obtain the first general
results on failure of the greedy algorithm in bandits, thus providing a theoretical
foundation for why bandit algorithms should explore.
Submission Number: 5945
Loading