Multi-Player Approaches for Dueling Bandits
TL;DR: We introduce a multiplayer dueling bandit problem for distributed systems with preference-based information, proposing approaches that match lower bounds and outperform single-player benchmarks.
Abstract: Fine-tuning large deep networks with preference-based human feedback has seen
promising results. As user numbers grow and tasks shift to complex datasets like images
or videos, distributed approaches become essential for efficiently gathering feedback.
To address this, we introduce a multiplayer dueling bandit problem, highlighting that exploring non-informative candidate pairs becomes especially challenging in a collaborative environment.
We demonstrate that the use of a Follow Your Leader black-box approach matches the asymptotic regret lower-bound when utilizing known dueling bandit algorithms as a foundation.
Additionally, we propose and analyze a message-passing fully distributed approach with a novel Condorcet-Winner recommendation protocol, resulting in expedited exploration in the non-asymptotic regime which reduces regret. Our experimental comparisons reveal that our multiplayer algorithms surpass single-player benchmark algorithms, underscoring their efficacy in addressing the nuanced challenges of this setting.
Submission Number: 532
Loading