Abstract: This paper revisits multi-play multi-armed bandit with
shareable arm capacities problem,
which is tailored for resource allocation problems
arising from LLM inference serving, edge intelligence, etc.
We investigate the capacity-scarce setting,
a common dilemma in resource allocation problems.
Existing works yield sub-optimal solutions under this setting, as they rely heavily on the assumption of abundant capacities.
This paper present a rather complete solution to this
setting and it makes three key contributions.
We establish a minmax lower bound for the
sample complexity of learning the arm capacities
and propose an algorithm that exactly matches this lower bound.
We derive both instance-independent
and instance-dependent regret lower bounds for
learning the optimal play assignment.
We introduce an efficient exploration algorithm
named \texttt{PC-CapUL} for the capacity-scarce setting and \texttt{PC-CapUL} matches
the regret lower bounds up to an acceptable constant.
\texttt{PC-CapUL} features a novel index for
coordinating the exploration of multiple plays.
Experiments show significant improvement
over existing methods.
Supplementary Material: pdf
Submission Number: 63
Loading