Bandit Social Learning under Myopic Behavior

Published: 21 Sept 2023, Last Modified: 02 Nov 2023NeurIPS 2023 posterEveryoneRevisionsBibTeX
Keywords: multi-armed bandits, greedy algorithm, social learning, myopic behavior, learning failures, algorithmic game theory
TL;DR: We analyze exploration failures when myopic agents collectively face a simple multi-armed bandit problem and act in a(ny) way consistent with confidence intervals.
Abstract: We study social learning dynamics motivated by reviews on online platforms. The agents collectively follow a simple multi-armed bandit protocol, but each agent acts myopically, without regards to exploration. We allow a wide range of myopic behaviors that are consistent with (parameterized) confidence intervals for the arms’ expected rewards. We derive stark exploration failures for any such behavior, and provide matching positive results. As a special case, we obtain the first general results on failure of the greedy algorithm in bandits, thus providing a theoretical foundation for why bandit algorithms should explore.
Submission Number: 5945
Loading