Multi-Player Approaches for Dueling Bandits

Published: 22 Jan 2025, Last Modified: 08 Mar 2025AISTATS 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We introduce a multiplayer dueling bandit problem for distributed systems with preference-based information, proposing approaches that match lower bounds and outperform single-player benchmarks.
Abstract: We introduce a multiplayer dueling bandit problem, which is tailored for distributed systems with only preference-based information available and is motivated by recent advancements in deep learning with human feedback. Compared to multiplayer bandits, this setting presents challenges in controlling the collaborative exploration of non-informative arm pairs. We demonstrate that the direct use of a Follow Your Leader black-box approach matches the asymptotic regret lower bound when utilizing known dueling bandit algorithms as a foundation. Additionally, we propose and analyze a message-passing fully distributed approach with a novel Condorcet-Winner recommendation protocol, resulting in expedited exploration in the nonasymptotic regime. Our experimental comparisons reveal that our multiplayer algorithms surpass single-player benchmark algorithms, underscoring their efficacy in addressing the nuanced challenges of the multiplayer dueling bandits.
Submission Number: 532
Loading