Keywords: Multi-armed Bandits, Many-to-one Matching, Uniqueness Conditions
Abstract: An emerging line of research is dedicated to the problem of one-to-one matching markets with bandits, where the preference of one side is unknown and thus we need to match while learning the preference through multiple rounds of interaction. However, in many real-world applications such as online recruitment platform for short-term workers, one side of the market can select more than one participant from the other side, which motivates the study of the many-to-one matching problem. Moreover, the existence of a unique stable matching is crucial to the competitive equilibrium of the market. In this paper, we first introduce a more general new \textit{$\tilde{\alpha}$}-condition to guarantee the uniqueness of stable matching in many-to-one matching problems, which generalizes some established uniqueness conditions such as \textit{SPC} and \textit{Serial Dictatorship}, and recovers the known $\alpha$-condition if the problem is reduced to one-to-one matching. Under this new condition, we design an MO-UCB-D4 algorithm with $O\left(\frac{NK\log(T)}{\Delta^2}\right)$ regret bound, where $T$ is the time horizon, $N$ is the number of agents, $K$ is the number of arms, and $\Delta$ is the minimum reward gap. Extensive experiments show that our algorithm achieves uniform good performances under different uniqueness conditions.
Supplementary Material: zip
20 Replies
Loading