Keywords: Contextual Bandits, Matching Markets, Optimal Stable Regret
TL;DR: We develop fully adaptive algorithms with theoretical guarantees for contextual bandit learning in matching markets, both for stochastic and adversarial contexts.
Abstract: We study bandit learning in matching markets, where players and arms constitute the two market sides, and the players' utilities are linear in the arm contexts. In each round, new arms arrive with observable contexts. Then, the algorithm matches them to players, aiming to minimize each player's regret against a *stable matching benchmark*. This contextual structure creates significant complexity: subtle context shifts can slightly alter one player's utility while completely reconfiguring the underlying benchmark, causing large regret spikes for others.
We address this in two settings: *stochastic* contexts, drawn from a latent distribution, and *adversarial* contexts, which may be arbitrary.
For the stochastic case, we introduce a novel minimum preference gap to capture learning difficulty and provide a fully adaptive algorithm with an instance-dependent poly-logarithmic regret upper bound. We also establish matching instance-independent regret upper and lower bounds under a mild distributional assumption. For the adversarial setting, we propose a tractable regret notion that remains valid under arbitrary contexts and achieves an instance-independent sublinear regret bound via an adaptive algorithm.
Track: Long Paper
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 115
Loading