TL;DR: Extended Gale-Shapley algorithm with exploration finds stable matchings in bandit learning with two-sided reward uncertainty.
Abstract: We study bandit learning in matching markets with two-sided reward uncertainty, extending prior research primarily focused on single-sided uncertainty. Leveraging the concept of `super-stability' from Irving (1994), we demonstrate the advantage of the Extended Gale-Shapley (GS) algorithm over the standard GS algorithm in achieving true stable matchings under incomplete information. By employing the Extended GS algorithm, our centralized algorithm attains a logarithmic pessimal stable regret dependent on an instance-dependent admissible gap parameter. This algorithm is further adapted to a decentralized setting with a constant regret increase. Finally, we establish a novel centralized instance-dependent lower bound for binary stable regret, elucidating the roles of the admissible gap and super-stable matching in characterizing the complexity of stable matching with bandit feedback.
Lay Summary: Imagine trying to set up fair and lasting partnerships, like matching people for jobs or dates online, where neither side initially knows exactly what they want or how good a match will truly be. This research presents a smarter way to find these "stable" partnerships, especially in complex scenarios where both sides are learning as they go. By using a more advanced matching algorithm this paper shows how systems can quickly learn to make excellent matches with very few mistakes over time, whether they operate through a central system or in a decentralized way. This work also sets a new limits for understanding how difficult it is to achieve truly stable partnerships when information is limited.
Primary Area: General Machine Learning->Online Learning, Active Learning and Bandits
Keywords: Bandits, Matching Markets, Stable Matching, Super Stability
Submission Number: 808
Loading