\section{Introduction}\label{sec:intro}

 The Generalized Magician's Problem (GMP)~\citep{alaei2013online,alaei2014bayesian} is a classical online optimization problem, where a decision maker (magician) chooses from a sequence of tasks to process in order to obtain as much reward as possible. 
 Each task has a resource consumption (cost) and a reward initially unknown to the decision maker, which are revealed in stages when the task arrives and is completed. 



However, inherent in the original GMP is that there is only a \textit{single worker} processing all accepted tasks. Thus, it fails to consider the complication of many practical scenarios, where the tasks may be processed by different workers, each with different capabilities and resource profiles that influence reward and resource consumption. A generalization of GMP toward multiple workers is well motivated by real-world scenarios in various areas, {\color{black} among which a representative example is task processing in cloud computing. In this application, a cloud provider offers a machine learning (ML) inference service. It receives ML task requests, where different ML models produce different accuracies and require different amounts of energy. 
The cloud provider needs to decide which task to accept or reject, and which model to use to process each accepted task, in order to maximize the accumulated accuracy. In Section~\ref{sec:exp} (and Appendix~\ref{sec:A.D.2}), this scenario is used as a case study to evaluate our proposed algorithm. Other applications of the GMP with multiple workers include} labor outsourcing with different workers and online ad placement with different types of ads. Further details on these applications can be found in Appendix~\ref{sec:A.A}.

In this paper, we introduce the Generalized Magician's Problem with Multiple Workers (GMPMW): Each task can be processed by one of several workers. 
Different workers generate different amounts of reward while consuming different amounts of resources.
The decision maker must decide on the acceptance of each task and its assignment to a worker, in order to
maximize the accumulated reward within a resource budget. To the best of our knowledge, this is the first work to consider multiple workers in GMP.

The GMPMW is substantially more complex than the original GMP, which arises from the need to balance resource consumption across different workers while relying only on limited and incrementally revealed knowledge of tasks -- a challenge absent in the single-worker GMP. Likewise, this challenge is absent in other online problems, as it is unique to GMPMW that we consider multiple stages of observability of a task's cost, progressing from zero initial knowledge to a probability distribution upon task arrival, and finally the exact value upon task completion (see details in Section~\ref{sec:rl}).

To overcome these challenges, we develop the Online Worker Assignment (OWA) algorithm to employ a balanced probability-fitting approach. 
We first balance the workers' resource consumption by optimally solving a problem with a non-convex constraint, which then guides online resource allocation through a set of assignment guarantees. Then, OWA tracks the virtual resource consumption, which captures the joint evolution of resource usage across workers and serves as the basis for planning the overall resource usage across tasks. Consequently, OWA balances resource consumption across workers and maximizes resource utilization to effectively handle uncertain rewards and fluctuating resource demands. 

\paragraph{Main Results:}
\begin{itemize}
    \item OWA establishes a novel framework for addressing the GMPMW. We derive the competitive ratio $\alpha$ of OWA, along with a closed-form lower bound $\alpha'=\max\{{1}/{L},c\}\cdot(1-K^{-\frac{1}{2}})$, where $L$ is the number of workers, $K$ is the resource budget, and $c$ is a constant derived from the problem instance. When there is only one worker, i.e., $L=1$, this lower bound is consistent with the previous best result on the GMP~\cite{alaei2013online}. Furthermore, we show that $\alpha$ can be reached in certain problem instances, so it is a tight performance bound. 
    \item We prove that when the reward lower bound for each worker is $0$, OWA is asymptotically optimal, meaning that $\alpha$ approaches the best competitive ratio as the resource budget increases.
 This result underscores the effectiveness of OWA's design and suggests its superiority compared with alternative algorithms.
    \item For the numerical case study, we perform trace-driven experiments on real-time video analytics over edge devices. These experiments validate the theoretical result and demonstrate OWA’s efficiency in leveraging multiple workers in GMPMW.
\end{itemize}
