Logarithmic Regret for Unconstrained Submodular Maximization Stochastic Bandit

Published: 18 Dec 2024, Last Modified: 14 Feb 2025ALT 2025EveryoneRevisionsBibTeXCC BY 4.0
Abstract: We address the *online unconstrained submodular maximization problem* (Online USM), in a setting with *stochastic bandit feedback*. In this framework, a decision-maker receives noisy rewards from a non monotone submodular function taking values in a known bounded interval. This paper proposes *Double-Greedy - Explore-then-Commit* (DG-ETC), adapting the Double-Greedy approach from the offline and online full-information settings. DG-ETC satisfies a $O(d\log(dT))$ problem-dependent upper bound for the $1/2$-approximate pseudo-regret, as well as a $O(dT^{2/3}\log(dT)^{1/3})$ problem-free one at the same time, outperforming existing approaches. In particular, we introduce a problem-dependent notion of hardness characterizing the transition between logarithmic and polynomial regime for the upper bounds.
PDF: pdf
Submission Number: 34
Loading