\section{The Magician's Problem with Multiple Workers}\label{sec:3}

In this section, we first present a formal description of the GMPMW, and then discuss the online environment and our objective of the algorithm design, including a formal definition of the competitive ratio adopted in GMP.

\subsection{Problem Formulation}
A sequence of $T$ tasks arrives at the decision maker one at a time. For each task, the decision maker may choose one of $L$ different workers to process it, indexed by $l$. When task $t$ arrives, the decision maker makes irrevocable decisions $x_{t,l}\in\{0,1\}$ for each worker $l$, determining whether to assign the task to worker $l$ (i.e., $x_{t,l}=1$) or to reject it (i.e., $x_{t,l}=0$ for all $l$).

If task $t$ is processed by worker $l\in [L]$, it will generate a reward $u_{t,l}$.
Otherwise, task $t$ is discarded and cannot be processed in the future.
The reward $u_{t,l}$ of processing task $t$ with worker $l$ is unobservable, determined by the hidden factors (e.g., user satisfaction), and it is revealed only after the task is completed.
Although $u_{t,l}$ is not known in advance, it varies within a range $[\underline{u}_l, \overline{u}_l]$, and the upper bound $\overline{u}_l$ and the lower bound $\underline{u}_l$ are known to the decision maker. We assume that $u_{t,l}\geq0$.

The decision maker has a total resource budget of $K$ for the workers. When task $t$ arrives, the probability distribution $\mathcal{R}_{t,l}$ for the resource consumption of each worker $l$ to process this task becomes known. This progressively improving observability of task cost reflects practical scenarios where resource consumption is influenced by measurable external factors (e.g., temperature) or observable task attributes (e.g., size), allowing for estimation upon the task’s arrival. When the decision maker assigns task $t$ to worker $l$, the worker consumes an amount of resource $r_{t,l}$ that is drawn from $\mathcal{R}_{t,l}$, which is then deducted from the remaining amount of the total resource budget. 
Without loss of generality, we assume that $r_{t,l} \in (0, 1]$. We further assume that $\mathcal{R}_{t,l}$ has a probability density function (PDF) $g_{t,l}(\cdot)$ that is continuous, along with the cumulative distribution $G_{t,l}(\cdot)$.  

We consider the hard budget constraint: If processing a task exceeds the remaining budget, the task will be discarded, and the consumed resource will not be recovered. Following the conventional large-scale setting proposed by~\citep{alaei2013online,alaei2014bayesian} for GMP, we assume that $\frac{1}{L}\sum_{l\in[L]}\sum_{t\in[T]}\mathbb{E}[r_{t,l}]\leq K$, i.e., the decision maker's resource is, on average, sufficient to process all tasks. {\color{black}The large-scale setting assumption of the GMPMW accounts for real-world engineering and operational problems where a reasonable budget
is allocated to the decision-maker.\footnote{{\color{black}For example, in project portfolio selection or maintenance scheduling,
organizations often plan a budget that is, in expectation, sufficient to cover most (if not all) tasks, while the resources must be allocated wisely.}}} We also assume that $K\geq1$, which is also conventionally adopted in GMP.


Our objective is to maximize the accumulated reward over the incoming tasks under the resource budget constraint. We formulate our optimization problem as follows:
\begin{align}
    \max_{x_{t,l}, \forall t,l}\quad &\sum_{t\in[T]}\sum_{l\in[L]}x_{t,l}u_{t,l},\label{eq:0-1}\\
    \text{s.t.}\quad &\sum_{t\in[T]}\sum_{l\in[L]}x_{t,l}r_{t,l}\leq K,\label{eq:0-2}\\
    &\sum_{l\in[L]} x_{t,l}\leq1, \forall t\in[T],\label{eq:0-3}\\
    &x_{t,l}\in\{0,1\}, \forall l\in[L],t\in[T],\label{eq:0-4}\\
    &r_{t,l}\sim\mathcal{R}_{t,l}, \forall l\in[L],t\in[T],\label{eq:0-5}
\end{align}
where constraint~(\ref{eq:0-2}) indicates the limited resource budget, constraints~(\ref{eq:0-3}) specify that at most one worker is assigned to process each task, constraints~(\ref{eq:0-4}) give the decision space, and (\ref{eq:0-5}) specify the distributions of resource consumption.


\subsection{Online Environment and Competitive Ratio}

In GMPMW, each task sequence $I$ is arranged by an adversary, with $T$, $\{u_{t,l}\}$, and $\{\mathcal{R}_{t,l}\}$ chosen by the adversary. 
A task sequence $I$ is random due to the distributions $\{\mathcal{R}_{t,l}\}$. We define $\Omega^{I}$ as the sample space of $I$ and $i\in\Omega^{I}$ as a sample path of the task sequence $I$. A sample path $i$ has a set of $r_{t,l}$ generated from $\{\mathcal{R}_{t,l}\}$ in its task sequence. 


The decision maker does not know $T$, $\{u_{t,l}\}$, or $\{\mathcal{R}_{t,l}\}$ in advance since these values are chosen by the adversary. The length $T$ of the task sequence is revealed when no more tasks arrive to the decision maker; the probability distribution of resource consumption $\mathcal{R}_{t,l}$ is revealed with its PDF $g_{t,l}(\cdot)$ when task $t$ arrives, which as explained previously is a distinct feature of GMP and GMPMW; and the reward $u_{t,l}$ and the exact resource consumption $r_{t,l}$ are known to the decision maker only after task $t$ is processed by worker $l$. Note that $u_{t,l}$ and $r_{t,l}$ remain unknown to the decision maker if task $t$ is not processed by worker $l$.


For a given sample path $i$, we denote the performance of an online algorithm $\text{ALG}$ as $\text{ALG}(i)$ and denote the optimal performance of the offline algorithm as $\text{OPT}(i)$. 
We assume that $i$ is fully known to $\text{OPT}$ in advance.
We denote the average performance of the online algorithm $\text{ALG}$ over all sample paths $i$ of the task sequence $I$ as $\mathbb{E}_{i\sim I}[\text{ALG}(i)]$. 
To assess the performance of an online algorithm in this paper, we use the competitive ratio as the primary metric. Here, $\text{ALG}$ is $\alpha$-\textit{competitive} if
\begin{align}
    \mathbb{E}_{i\sim I}[\text{ALG(i)}]\geq \alpha \max_{i\in \Omega^I}[\text{OPT}(i)], \forall I\in\mathcal{I},\label{eq:online1}
\end{align}
where $\mathcal{I}$ is the set of all possible task sequences.
This definition of the competitive ratio is previously adopted by~\citet{alaei2013online,alaei2014bayesian,srinivasan2022generalized}. Note that this definition is stronger than another commonly used definition~\citep{buchbinder2009design}:
\begin{align}
    \mathbb{E}_{i\sim I}[\text{ALG}(i)]\geq \alpha \mathbb{E}_{i\sim I}[\text{OPT}(i)], \forall I\in\mathcal{I},\label{eq:online2}
\end{align}
as  (\ref{eq:online1}) implies (\ref{eq:online2}). 

