Keywords: Multi-agent Reinforcement Learning, Bandits, Two-sided Matching
Abstract: Two-sided matching under uncertainty has recently drawn much attention due to its wide applications.
Existing works in matching bandits mainly focus on the one-sided learning setting and design algorithms with the objective of converging to stable matching with low regret. In this paper, we consider the more general two-sided learning setting, i.e. participants on both sides have to learn their preferences over the other side through repeated interactions.
Inspired by the classical result that the optimal matching for the proposing side can be obtained using the Gale-Shapley algorithm, our inquiry stems from the curiosity about whether this result still holds in a two-sided learning setting. To handle this question, we formally introduce the two-sided learning setting, addressing strategies for both the arm and player sides without restrictive assumptions such as special preference structure and observation of winning players.
Our results not only provide a positive answer to our inquiry but also offer a near-optimal upper bound, achieving $O(\log T)$ regret.
Supplementary Material: zip
List Of Authors: Zhang, Yirui and Fang, Zhixuan
Latex Source Code: zip
Signed License Agreement: pdf
Submission Number: 314
Loading