Abstract: A classification bandits problem is a new class of multi-armed bandits problems in which an agent must classify a given set of arms into positive or negative depending on whether the number of bad arms are at least $$N_2$$ or at most $$N_1(<N_2)$$ by drawing as fewer arms as possible. In our problem setting, bad arms are imperfectly characterized as the arms with above-threshold expected rewards (losses). We develop a method of reducing classification bandits to simpler one threshold classification bandits and propose an algorithm for the problem that classifies a given set of arms correctly with a specified confidence. Our numerical experiments demonstrate effectiveness of our proposed method.
0 Replies
Loading