\section{Online Worker Assignment (OWA) Algorithm Design}\label{sec:4}


In this section, we present our Online Worker Assignment (OWA) framework for solving GMPMW. 
OWA is an online algorithm that addresses the complicated tradeoff between accumulating rewards and balancing limited resources among multiple workers. It requires fully leveraging the progressively improved task cost observability, making its design significantly more challenging and distinct from existing solutions for the original GMP and other online problems.

The OWA algorithm (summarized in Alg.~\ref{alg:newmain}) is a balanced probability-fitting approach consisting of four main phases: Pre-Calculation, Worker-Assignment, Processing, and Baseline-Calibration.  \textcircled{1} Before the first task arrives, OWA enters the pre-calculation phase (Line~\ref{1-1-3}), where it calculates an \emph{assignment guarantee} $\gamma_l^*$ for each worker $l$ {\color{black} based on reward bounds and
the resource budget (Eqs.~(\ref{eq:p1start})--(\ref{eq:p1end}))}, which later {\color{black}guides the decision-making in the online
process} by balancing online resource consumption across workers (Line~\ref{1-0-10}). \textcircled{2} When task $t$ arrives, we enter the worker-assignment phase (Lines~\ref{1-0-9}--\ref{1-1-8}), where OWA compares the \emph{resource utilization level} $\Theta_{t}$ against a \emph{resource utilization baseline} $\theta_{t}$ to decide whether to accept this task. \textcircled{3} In the processing phase (Lines~\ref{1-0-16}--\ref{1-1-17}), the task is processed or discarded according to the decision. \textcircled{4} In the baseline-calibration phase (Lines~\ref{1-1-19}--\ref{2-0-7}), OWA first updates the \textit{virtual resource consumption} to capture the long-term resource availability and the joint evolution of different workers, accounting for both the randomness of task cost and the randomization of online decisions. It then derives a new baseline $\theta_{t+1}$ for the next decision, carefully fitting the probability of processing the next task to the long-term resource availability, striking a balance between accumulating rewards and preserving sufficient resources for future high-reward tasks.

\begin{algorithm}[t]
\caption{Online Worker Assignment (OWA) Algorithm}
\label{alg:newmain}

\textbf{INPUT:} $L$, $K$, $\{\underline{u}_l\}$ and $\{\Bar{u}_l\}$\label{1-0-1}

\begin{algorithmic}[1]

\STATE $t\leftarrow1$, $\theta_{1}\leftarrow0$, $\Theta_{1}\leftarrow0$, $x_{t,l}\leftarrow0$, $\phi_1\leftarrow1$\label{1-0-3}
\STATE \textit{\# Pre-Calculation Phase}
\STATE Calculate $\gamma_l^*$ by solving optimization problem $\mathtt{P}_1$ in Eqs.~(\ref{eq:p1start})--(\ref{eq:p1end}).\label{1-1-3}
\WHILE{task $t$ arrives ($g_{t,l}$ is observed)\label{1-0-6}}\label{1-0-5}
    \STATE \textit{\# Worker-Assignment Phase}\label{1-0-7}
    \IF{$\Theta_{t}\leq\theta_{t}$ and $K-\Theta_{t}\geq1$}\label{1-0-9}
        \STATE Select a worker $l$ with probability $\gamma_l^*/\phi_t$ and set $x_{t,l}\leftarrow1$.\label{1-0-10}
    \ENDIF\label{1-1-8}
    \STATE \textit{\# Processing Phase}\label{1-0-15}
    \IF{$\sum_{l}x_{t,l}=1$}\label{1-0-16}
        \STATE The selected worker $l$ processes task $t$.\label{1-0-17}
        \STATE (Reward $u_{t,l}$ and resource consumption $r_{t,l}$ are revealed).\label{1-0-18}
        \STATE $\Theta_{t+1}\leftarrow \Theta_{t}+r_{t,l}$\label{1-0-19}
    \ELSE\label{1-0-21}
        \STATE Discard task $t$\label{1-0-22}
        \STATE $\Theta_{t+1}\leftarrow \Theta_{t}$\label{1-0-23}
    \ENDIF\label{1-1-17}
    \STATE \textit{\# Baseline-Calibration Phase}\label{1-0-25}
    \IF{$t=1$}\label{1-1-19}
        \STATE Initialize $h_{1}(\cdot)$ as $\delta(\cdot)$.\label{2-0-3}
    \ENDIF
    \STATE Calculate $h_{t+1}(\cdot)$ by Eq.~(\ref{eq:h2}).\label{2-0-6}
    \STATE Calculate $\theta_{t+1}$ and $\phi_{t+1}$ by Eqs.~(\ref{eq:theta})--(\ref{eq:phi}).\label{2-0-7}
    \STATE $t\leftarrow t+1$\label{1-0-27}
\ENDWHILE
\end{algorithmic}
\end{algorithm}

\subsection{Pre-Calculation Phase}

Before the first task arrives, OWA starts in the pre-calculation phase to determine an assignment guarantee $\gamma_l^*$ for each worker $l$. It is the optimal solution to the following problem:
\begin{align}
        \mathtt{P}_1\quad\max_{\gamma_l, \forall l}\quad&\min_l\frac{\sum_{l'\in[L]\setminus\{l\}}\gamma_{l'}\underline{u}_{l'}}{\overline{u}_{l}}+\gamma_l\label{eq:p1start}\\
        \textbf{s.t.}\quad\ \  
            &K\geq\frac{1}{(1- \sum_{l\in[L]}\gamma_{l})(1-\max_{l}\gamma_{l}L)},\label{eq:9}\\
            &\max_{l}\gamma_{l}<\frac{1}{L},\\
            &\gamma_l\geq0,\ \forall l.\label{eq:p1end}
    \end{align}
This optimization problem captures the limited prior knowledge of task rewards and costs while accounting for the impact of each decision on future resource dynamics, ensuring that the assignment guarantees optimally balance resource consumption across workers in online decision-making. The reason why OWA sets the assignment guarantee $\gamma_l^*$ in this way will be explained in detail later in Section~\ref{sec:CR}, where we derive the competitive ratio of OWA. 

Since the decision maker knows $L$, $K$, $\underline{u}_{l}$, and $\overline{u}_l$, the optimization problem $\mathtt{P}_1$ is solved in an offline manner before the tasks arrive. Note that although problem $\mathtt{P}_1$ has a non-convex constraint~(\ref{eq:9}), its optimal solution can still be obtained using standard convex optimization solvers, as detailed in Section~\ref{sec:CR}. Throughout this paper, we denote the optimal value of the objective of $\mathtt{P}_1$ by $\mathtt{P}_1(\{\gamma_l^*\})$, such that
    % \begin{align}
        $\mathtt{P}_1(\{\gamma_l^*\})=\min_l{(\sum_{l'\in[L]\setminus\{l\}}\gamma_{l'}^*\underline{u}_{l'})}/{\overline{u}_{l}}+\gamma_l^*$.
    % \end{align}


\subsection{Worker-Assignment Phase}

When task $t$ arrives, OWA starts the worker-assignment phase (Lines~\ref{1-0-7}--\ref{1-0-10}). 
For all tasks, we use $\Theta_{t}=\sum_{\tau<t}\sum_{l}x_{\tau}r_{\tau,l}$ to record the current resource utilization level at the arrival of task $t$. {\color{black} If the remaining amount of resource $K-\Theta_{t}$ is less than $1$, OWA discards task $t$ to avoid exceeding the resource budget and causing an infeasible solution, due to the random nature of the resource consumption.}
If the remaining amount of resource $K-\Theta_{t}$ is at least $1$ and $\Theta_{t}\leq \theta_{t}$, OWA selects a worker $l$ with probability $\gamma_l^*/\phi_t$ and sets $x_{t,l}=1$, or discards task $t$ with probability $1-\sum_l\gamma_l^*/\phi_t$ and sets $x_{t,l}=0$ for all $l$. Here, $\phi_t$ is an offset parameter initialized to $1$ and updated in the baseline-calibration phase, and its update rule will be discussed shortly in Section~\ref{sec:4.5}. 


\subsection{Processing Phase}

After the decision $\{x_{t,l}\}$ is made, OWA enters the processing phase. In this phase, task $t$ is processed by the selected worker $l$ or discarded, and $\Theta_{t}$ is updated accordingly. 
If task $t$ is processed by worker $l$, its reward $u_{t,l}$ is revealed and received by the decision maker, and its consumption $r_{t,l}$ is realized from the probability distribution $\mathcal{R}_{t,l}$ and is counted toward the current resource utilization level in
$\Theta_{t}$ (Line \ref{1-0-19}).
Otherwise, the reward $u_{t,l}$ and the resource consumption $r_{t,l}$ of the task $t$ for each worker $l$ remain unknown. 
The decision maker receives no reward from this task, and $\Theta_{t}$ does not change toward $\Theta_{t+1}$  (Line \ref{1-0-23}).

\subsection{Baseline-Calibration Phase}\label{sec:4.5}


In the Baseline-Calibration Phase, given the observed PDF $g_{t,l}(\cdot)$ of the resource consumption of each worker, OWA generates the new resource utilization baseline $\theta_{t+1}$ and the new offset $\phi_{t+1}$ (Lines~\ref{1-0-25}--\ref{2-0-7}). 
Note that this calculation does not depend on whether the task $t$ is processed. This calculation is based on the \textit{virtual resource consumption} $h_t(w)$, which evaluates the impact of all past decisions on future resource consumption by considering all potential task acceptances, task assignments to different workers, and the corresponding resource consumption.
It is initialized as the Dirac delta function $h_1(w)=\delta(w)$,  representing the initial budget before any consumption, and is updated as
\begin{align}
    h_{t+1}(w)=&(1-\sum_{l\in[L]}\gamma_l^*/\phi_t)h_t(w)+(\sum_{l\in[L]}\gamma_l^*/\phi_t)\nonumber\\
    &\times\left[\overline{h}_t (w) * \overline{g}_t(w)+h_t(w) - \overline{h}_t(w)\right].\label{eq:h2}
\end{align}

In Eq.~(\ref{eq:h2}), the first summation term captures the impact of potentially discarding task $t$ on future resource consumption, while the second term corresponds to the decision to accept it. In the second summation term in Eq.~(\ref{eq:h2}), $\overline{h}_t$ is the truncated resource utilization function
\begin{align}
    \overline{h}_t(w)=\begin{cases}
        h_t(w),\ w\leq\theta_t,\\
        0,\ w>\theta_t,
    \end{cases}
\end{align}
capturing the impact of baseline $\theta_t$ on the decision making, $\overline{g}_t$ is the average resource consumption PDF calculated by \begin{align}
    \overline{g}_{t}(w)=(1/\sum_{l\in[L]}\gamma_l^*)\sum_{l\in[L]}\gamma_l^*g_{t,l}(w),\label{eq:barg}
\end{align}
capturing the balanced resource consumption across workers while incorporating the newly improved observability (i.e., distribution) of task costs, and $*$ is the convolution operator, capturing the potential future dynamics of resource consumption caused by processing task $t$.
In this way, the virtual resource consumption $h_{t+1}$ captures the stochastic nature of actual resource consumption and the long-term resource availability, accounting for both the randomness of task costs and the randomization in decision-making.
We then calculate $\theta_{t+1}$ and $\phi_{t+1}$ (Line~\ref{2-0-7}) as follows:
\begin{align}
    \theta_{t+1}=&\arg\min_{w}\left\{\int_{-\infty}^wh_{t+1}(v)dv\geq \sum_{l\in[L]}\gamma_l^*\right\},\label{eq:theta}
    \end{align}
\begin{align}
    \phi_{t+1}=\int_{-\infty}^{\theta_{t+1}}h_{t+1}(v)dv,\label{eq:phi}
\end{align}
in order to ensure that the baseline $\theta_{t+1}$ and offset $\phi_{t+1}$ carefully ``fit'' the resource consumption of processing task $t+1$ to the long-term resource availability, balancing immediate resource utilization with preserving resources for the future.


\subsection{OWA Algorithm Complexity Analysis}

We provide a detailed complexity analysis of OWA in Appendix~\ref{sec:complexity} and offer a summary here. The complexity of the pre-calculation phase is $\mathcal{O}(L^3)$ when the optimal solution to $\mathtt{P}_1$ is obtained by the \emph{barrier method}~\citep{boyd2004convex}. The complexity of the worker-assignment phase is $\mathcal{O}(1)$.
In the baseline-calibration phase, the random variables $\mathcal{R}_{t,l}$ can be continuous, with continuous PDFs $\{g_{t,l}\}$. In practice, we can discretize them with arbitrary granularity.  Let $D$ be the number of levels used for discretization (a larger $D$ implying higher accuracy).
The complexity is $\mathcal{O}(D\log(D))$ in this phase. We note that the pre-calculation phase is performed offline before online tasks arrive. Only the processing phase and the baseline-calibration phase are performed in an online manner.  Accordingly, the overall complexity of OWA for each task is $\mathcal{O}(D\log(D))$. 

