Bandit Learning in Matching Markets with Switching Cost

11 Sept 2025 (modified: 18 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Matching Market, Bandit Algorithm, Switching Cost, Parallel Shortest Hamiltonian Circuit Problem
TL;DR: Reducing the switching cost in matching market under bandit learning framework
Abstract: We study the bandit learning problem in two-sided matching markets. While existing works successfully derive sub-linear bounds for the player-optimal regret, they typically assume cost-free switching and may incur up to $O(T)$ switches over a time horizon of length $T$. Such frequent reassignments are impractical in real-world applications since switching is usually costly and disruptive. To address this limitation, we explicitly incorporate switching costs into the decision-making process and aim to minimize player-optimal stable regret under a switching-cost budget. We first consider a setting with unit switching cost, where each switch incurs a fixed cost. We propose a cost-aware algorithm that achieves the same regret bound of $O(\log T/\Delta^2)$ as previous approaches while reducing the total number of switches to $O(\log T)$, where $\Delta$ is the players' minimum preference gap. Furthermore, we show that by slightly relaxing the regret to $O(\sqrt{T/\Delta^2})$, the total number of switches can be reduced to $O(\log \log T)$; in the extreme case, with only $O(1)$ switches, the algorithm still guarantees a regret of $O(T^{2/3})$. We also generalize this approach to heterogeneous switching cost setting by leveraging the shortest Hamiltonian Circuit orderings and provide analogous theoretical guarantees.
Primary Area: learning theory
Submission Number: 4082
Loading