Comparing Multi-Armed Bandit Algorithms and Q-learning for Multiagent Action Selection: a Case Study in Route Choice

Thiago B. F. de Oliveira, Ana L. C. Bazzan, Bruno C. da Silva, Ricardo Grunitzki

2018 (modified: 10 May 2023)IJCNN 2018Readers: Everyone

Abstract: The multi-armed bandit (MAB) problem is concerned with an agent choosing which arm of a slot machine to play in order to optimize its reward. A family of reinforcement learning algorithms exists to tackle this problem, including a few variants that consider more than one agent (thus, characterizing a repeated game) and non-stationary variants. In this paper, we seek to evaluate the performance of some of these MAB algorithms and compare them with Q-learning when applied to a non-stationary repeated game, where commuter agents face the task of learning how to choose a route that minimizes their travel times.

0 Replies