Multi-Agent Combinatorial-Multi-Armed-Bandit framework for the Submodular Welfare Problem under Bandit Feedback
Keywords: Multi-Agent Combinatorial Multi-Arm Bandits, Resilience guarantees, Bandit feedback, Submodular optimization
TL;DR: This paper establishes resilience guarantees for the Submodular Welfare Problem and introduces MA-CMAB, a framework that turns noisy value-oracle offline welfare algorithms into online multi-agent partition-bandit policies under full-bandit feedback.
Abstract: We study the \emph{Submodular Welfare Problem} (SWP) that involves partitioning the items among multiple agents with monotone submodular utilities to maximize utilitarian welfare in a setting where only \emph{bandit feedback} (aggregate outcomes) is observable.
Classically, SWP assumes that the valuation of the agents for each subset is available to the algorithm. Thus, this problem becomes a special case of monotone submodular maximization under a matroid constraint. For classical SWP, existing literature shows that the greedy algorithm guarantees a $1/2$ approximation and continuous greedy with pipage/randomized rounding attains the optimal $(1-1/e)$ in the value‐oracle model. For SWP, we introduce {MA-CMAB}, a \emph{multi‐agent} combinatorial multi-arm bandit framework that centers \emph{partitioning} as the core action under \emph{full‐bandit} feedback with non‐communicating agents. Online variants have largely focused on \emph{single‐agent} combinatorial multi-arm bandits (CMAB) or multi‐agent reductions with partial communication and separable objectives, where actions are single subsets and (semi‐)bandit feedback. In an online setup, we show that an explore–then–commit reduction with a discrete randomized assignment policy achieves $\tilde{\mathcal{O}}(T^{2/3})$ regret against a $(1-1/e)$‐approximation benchmark for partition-based submodular welfare. This is, to our knowledge, the first regret guarantee for \emph{partition‐based} submodular welfare in a non‐communicating multi‐agent bandit model, distinguishing our setting from best‐subset selection in CMAB and separable multi‐agent formulations.
Area: Learning and Adaptation (LEARN)
Generative A I: I acknowledge that I have read and will follow this policy.
Submission Number: 1674
Loading