The Perils of Misspecified Priors and Optional Stopping in Multi-Armed Bandits

Markus Loecher

Published: 01 Jan 2021, Last Modified: 01 Oct 2024Frontiers Artif. Intell. 2021EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: p>The connection between optimal stopping times of American Options and multi-armed bandits is the subject of active research. This article investigates the effects of optional stopping in a particular class of multi-armed bandit experiments, which randomly allocates observations to arms proportional to the Bayesian posterior probability that each arm is optimal (<italic>Thompson sampling</italic>). The interplay between optional stopping and prior mismatch is examined. We propose a novel partitioning of regret into peri/post testing. We further show a strong dependence of the parameters of interest on the assumed prior probability density.</p>