Keywords: dueling bandits, distributed optimization, game theory
TL;DR: We present and analyze a new algorithm for preference-based distributed maximization.
Abstract: We consider the problem of learning to optimize a welfare objective in a multi-agent system coordinated by a central authority. This setting presents two main challenges: (i) the welfare function is unknown or difficult to specify explicitly, and (ii) centralized optimization is intractable due to the exponential dependence on the number of agents. We address these challenges by combining preference-based learning with a game-theoretic reformulation of the central optimization problem. By designing agents' utilities aligned with the social welfare, this formulation enables independent learning to maximize the welfare value. Specifically, we propose a novel algorithm that iteratively combines dueling bandit-style preference learning with game-theoretic no-regret learning to guide agents' actions. Under a submodularity assumption on the welfare function, we prove that our proposed algorithm has sublinear regret. Our regret guarantee furthermore implies that, with high probability, the average welfare over $T$ rounds is near-optimal up to a constant depending on the curvature of the welfare function. Finally, we validate our approach in a case study on rebalancing a shared mobility system, where vehicles are placed strategically across different areas.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Paper Type: Standard paper
Submission Number: 37
Loading