Stochastic-Submodular Bandits with Full Bandit Feedback

Published: 2025, Last Modified: 25 Sept 2025AAMAS 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In this paper, we present the first sublinear α-regret bounds for online k-submodular optimization problems with full-bandit feedback, where α is a corresponding offline approximation ratio. Specifically, we propose online algorithms for multiple k-submodular stochastic combinatorial multi-armed bandit problems, including (i) monotone functions and individual size constraints, (ii) monotone functions with matroid constraints, (iii) non-monotone functions with matroid constraints, (iv) non-monotone functions without constraints, and (v) monotone functions without constraints. We transform approximation algorithms for offline k-submodular maximization problems into online algorithms through the offline-to-online framework proposed by [9]. A key contribution of our work is analyzing the robustness of the offline algorithms.
Loading