Efficient Algorithms for Combinatorial-Bandits with Monotonicity

Published: 22 Sept 2025, Last Modified: 01 Dec 2025NeurIPS 2025 WorkshopEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Combinatorial bandits, top-k arms identification, non-linear reward functions, full- bandit feedback, sub-Gaussian noise, pure exploration, sample complexity, stochastic bandits, multi-armed bandits, machine learning, decision theory
TL;DR: Solving general top-k Combinatorial multi-armed bandits with a pairwise preference algorithm
Abstract: In this paper, we study the problem of combinatorial bandits with stochastic full-bandit feedback, where the feedback is aggregated using an unknown operator. Unlike traditional combinatorial bandits, where the feedback is often assumed to be linear, we consider a setting with non-linear reward functions determined by any monotonic operator, which generalize various aggregation methods, including maximum, minimum, and k-order statistics. The challenge arises from the need to identify the top-$k$ arms efficiently without explicit knowledge of the underlying aggregation operator. We propose novel algorithms that leverage sub-Gaussian noise assumptions and gap-based analysis to provide strong theoretical guarantees on the sample complexity for identifying the top-$k$ arms with high confidence. We apply a modified version of an algorithm designed for pairwise preferences. Our results extend to various scenarios, including any monotonic function, assuming that the noise can be assumed to be sub-Gaussian, showing that our methods achieve strong performance in terms of both time and sample efficiency.
Submission Number: 159
Loading