Keywords: submodular, supermodular, interactive, bandit, UCB, online learning, submodularity ratio, submodularity curvature
TL;DR: We provide regret bounds for the maximization, under bandit feedback, of functions that can be decomposed as a sum of submodular and supermodular components, or are approximately submodular.
Abstract: In the context of online interactive machine learning with combinatorial objectives, we extend purely submodular prior work to more general non-submodular objectives. This includes: (1) those that are additively decomposable into a sum of two terms (a monotone submodular and monotone supermodular term, known as a BP decomposition); and (2) those that are only weakly submodular. In both cases, this allows representing not only competitive (submodular) but also complementary (supermodular) relationships between objects, enhancing this setting to a broader range of applications (e.g., movie recommendations, medical treatments, etc.) where this is beneficial. In the two-term case, moreover, we study not only the more typical monolithic feedback approach but also a novel framework where feedback is available separately for each term. With real-world practicality and scalability in mind, we integrate \Nystrom{} sketching techniques to significantly improve the computational complexity, including for the purely submodular case. In the Gaussian process contextual bandits setting, we show sub-linear theoretical regret bounds in all cases. We also empirically show good applicability to recommendation systems and data subset selection.
Supplementary Material: zip
List Of Authors: Narang, Adhyyan and Sadeghi, Omid and Ratliff, Lillian and Fazel, Maryam and Bilmes, Jeff
Latex Source Code: zip
Signed License Agreement: pdf
Code Url: https://github.com/AdhyyanNarang/onlineBP
Submission Number: 619
Loading