Generalized Objectives in Adaptive Experiments: The Frontier between Regret and Speed

Chao Qin; Daniel Russo

Generalized Objectives in Adaptive Experiments: The Frontier between Regret and Speed

Chao Qin, Daniel Russo

Published: 27 Oct 2023, Last Modified: 22 Dec 2023RealML-2023EveryoneRevisionsBibTeX

Keywords: multi-armed bandits, regret minimization, best-arm identification, Thompson sampling

Abstract: This paper formulates a generalized model of multi-armed bandit experiments that accommodates both cumulative regret minimization and best-arm identification objectives. We identify the optimal instance-dependent scaling of the cumulative cost across experimentation and deployment, which is expressed in the familiar form uncovered by Lai and Robbins (1985). We show that the nature of asymptotically efficient algorithms is nearly independent of the cost functions, emphasizing a remarkable universality phenomenon. Balancing various cost considerations is reduced to an appropriate choice of exploitation rate. Additionally, we explore the Pareto frontier between the length of experiment and the cumulative regret across experimentation and deployment. A notable and universal feature is that even a slight reduction in the exploitation rate (from one to a slightly lower value results) in a substantial decrease in the experiment's length, accompanied by only a minimal increase in the cumulative regret.

Submission Number: 57

Loading