Anytime algorithms for multi-armed bandit problems

Robert D. Kleinberg

Published: 2006, Last Modified: 17 May 2023SODA 2006Readers: Everyone

Abstract: How should a decision-maker perform repeated choices so as to optimize the average cost or benefit of those choices in the long run? This question motivates the theory of online learning, which encompasses problems such as the well-known best-expert [13, 9] and multi-armed bandit [10, 1] problems. This paper concerns a new approach to dealing with multi-armed bandit problems in which the decision-maker's strategy set is large (exponential or possibly infinite). Recent theoretical progress on the analysis of algorithms for such problems (e.g. [2, 3, 8, 11, 14]) has led to improved online algorithms for problems in areas such as online routing [2], dynamic pricing mechanisms [4, 5, 12], and analysis of reputation systems in e-commerce and peer-to-peer networks [3].

0 Replies