Set-Size Dependent Combinatorial Bandits

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Combinatorial Multi-armed Bandit, Set-Size Dependent, Online learning
TL;DR: This paper raises a variant model of traditional CMAB model and gives new algorithms based on sorting.
Abstract: This paper introduces and studies a new variant of Combinatorial Multi-Armed Bandits (\CMAB{}), called Set-Size Dependent Combinatorial Multi-Armed Bandits (\SDMAB{}). In \SDMAB{}, each base arm is associated with a set of different reward distributions instead of a single distribution as in \CMAB{}, and the reward distribution of each base arm depends on the set size, i.e., the number of the base arms in the chosen super arm in \CMAB{}. \SDMAB{} involves a much larger exploration set of the super arms than the basic \CMAB{} model. An important property called order preservation exists in \SDMAB{}, i.e. the order of reward means of base arms is independent of set size, which widely exists in real-world applications. We propose the \SUCB{} algorithm, effectively leveraging the order preservation property to shrink the exploration set. We provide theoretical upper bound of $O\left(\max\left\{\frac{M\delta_L}{\Delta_{L}},\frac{L^2}{\Delta_S}\right\}\log(T)\right)$ for \SUCB{} which outperforms the classic \CMAB{} algorithms with regret $O\left(\frac{ML^2}{\Delta_S}\log(T)\right)$, where $M$ denotes the number of base arms, $L$ denotes the maximum number of base arms in a super arm, $\delta$ and $\Delta$ are related to the gap of arms. We also derive a lower bound which can be informally written as $\Omega\left(\max\left\{\min_{k\in[L]}\left\{\frac{(M-L)\delta_{k}}{\Delta_{k}^2}\right\},\frac{L^2}{\Delta_S}\right\}\log(T)\right)$ showing that \SUCB{} is partially tight. We conduct numerical experiments, showing the good performance of \SUCB{}.
Supplementary Material: zip
Primary Area: learning theory
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 10815
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview