An Explore-then-Commit Algorithm for Submodular Maximization Under Full-bandit Feedback

Guanyu Nie; Mridul Agarwal; Abhishek Kumar Umrawal; Vaneet Aggarwal; Christopher John Quinn

An Explore-then-Commit Algorithm for Submodular Maximization Under Full-bandit Feedback

Guanyu Nie, Mridul Agarwal, Abhishek Kumar Umrawal, Vaneet Aggarwal, Christopher John Quinn

Published: 20 May 2022, Last Modified: 05 May 2023UAI 2022 PosterReaders: Everyone

Keywords: Multi-armed bandit, submodular, bandit feedback

TL;DR: We propose a simple but efficient algorithm for combinatorial multi-armed bandits with stochastic submodular (in expectation) rewards and full-bandit feedback.

Abstract: We investigate the problem of combinatorial multi-armed bandits with stochastic submodular (in expectation) rewards and full-bandit feedback, where no extra information other than the reward of selected action at each time step $t$ is observed. We propose a simple algorithm, Explore-Then-Commit Greedy (ETCG) and prove that it achieves a $(1-1/e)$-regret upper bound of $\mathcal{O}(n^\frac{1}{3}k^\frac{4}{3}T^\frac{2}{3}\log(T)^\frac{1}{2})$ for a horizon $T$, number of base elements $n$, and cardinality constraint $k$. We also show in experiments with synthetic and real-world data that the ETCG empirically outperforms other full-bandit methods.

Supplementary Material: zip

5 Replies

Loading