Set-Size Dependent Combinatorial Bandits

Jingyuan Liu; Hao Qiu; Xutong Liu; Xuchuang Wang; Mohammad Hajiesmaili; Lin Yang

Set-Size Dependent Combinatorial Bandits

Jingyuan Liu, Hao Qiu, Xutong Liu, Xuchuang Wang, Mohammad Hajiesmaili, Lin Yang

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Combinatorial Multi-armed Bandit, Set-Size Dependent, Online learning

TL;DR: This paper raises a variant model of traditional CMAB model and gives new algorithms based on sorting.

Abstract: This paper introduces and studies a new variant of Combinatorial Multi-Armed Bandits (\CMAB{}), called Set-Size Dependent Combinatorial Multi-Armed Bandits (\SDMAB{}). In \SDMAB{}, each base arm is associated with a set of different reward distributions instead of a single distribution as in \CMAB{}, and the reward distribution of each base arm depends on the set size, i.e., the number of the base arms in the chosen super arm in \CMAB{}. \SDMAB{} involves a much larger exploration set of the super arms than the basic \CMAB{} model. An important property called order preservation exists in \SDMAB{}, i.e. the order of reward means of base arms is independent of set size, which widely exists in real-world applications. We propose the \SUCB{} algorithm, effectively leveraging the order preservation property to shrink the exploration set. We provide theoretical upper bound of $O\left(\max\left\{\frac{M\delta_L}{\Delta_{L}},\frac{L^2}{\Delta_S}\right\}\log(T)\right)$ for \SUCB{} which outperforms the classic \CMAB{} algorithms with regret $O\left(\frac{ML^2}{\Delta_S}\log(T)\right)$, where $M$ denotes the number of base arms, $L$ denotes the maximum number of base arms in a super arm, $\delta$ and $\Delta$ are related to the gap of arms. We also derive a lower bound which can be informally written as $\Omega\left(\max\left\{\min_{k\in[L]}\left\{\frac{(M-L)\delta_{k}}{\Delta_{k}^2}\right\},\frac{L^2}{\Delta_S}\right\}\log(T)\right)$ showing that \SUCB{} is partially tight. We conduct numerical experiments, showing the good performance of \SUCB{}.

Supplementary Material: zip

Primary Area: learning theory

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 10815

Loading