\section{Comprehensive Comparison of GMP and GMPMW with Related Problems}\label{sec:A.B}

\begin{figure}[t]
    \centering
    \includegraphics[width=0.99\linewidth]{Figures/survey.pdf}
    \caption{(Reproduced from Figure~\ref{fig:survey}) Progression of task cost observability in online problems.}
    \label{fig:surveyApdx}
\end{figure}

A distinct feature of GMP and GMPMW is their unique progression of task cost observability, which refers to the key instants in online decision-making when the observability of task costs improves, as well as the extent of those improvements. This uniqueness in GMP and GMPMW stems from their complete progression of task cost observability—initially unknown, partially observable as a probability distribution upon task arrival, and fully revealed as an exact value upon task completion, involving all three levels of task cost observability. 

To further clarify this distinction, Figure~\ref{fig:surveyApdx} compares the progression of task cost observability across different problems. Task cost observability may increase at four key instants: before task arrival, upon task arrival, upon discarding, and after processing, categorized into three levels: Unknown (no information), Distribution (probability distribution known), and Value (exact value known). We compare GMP and GMPMW (red) against the Online Generalized Assignment Problem (OGAP), Online Stochastic Generalized Assignment Problem (OSGAP), Online Knapsack Problem (OKP), Online Stochastic Knapsack Problem (OSKP), and the Bandit with Knapsack (BwK) problem, including its Full Feedback (BwK-F) and Bandit Feedback (BwK-B) variants. Unlike other problems that experience only two levels of task cost observability, GMP and GMPMW are the only ones that undergo the full progression involving all three levels. 
Figure~\ref{fig:survey} summarizes Tables~\ref{tab:survey1}--\ref{tab:survey3}, where we compare the progression of observability of both task reward and cost across different problems. {\color{black}For conciseness, we use ``R'' to denote the observability of task reward and ``C'' to denote the observability of task cost throughout Tables~\ref{tab:survey1}--\ref{tab:survey3}.} The details of this comparison, along with the tables, are discussed in the remainder of this section. 


\subsection{GMP, OGAP, and OSGAP}

\begin{table*}[t]
\centering
  \caption{Task Observability of Rewards {\color{black}(R)} and Costs {\color{black}(C)} in GMP, OGAP, and OSGAP}
  \label{tab:survey1}
  \begin{tabular}{|l|c|c|c|c|p{5cm}|}  
    \toprule
    \makecell{Online\\Problems} & \makecell{Before\\Arrival} & \makecell{Upon\\ Arrival} & \makecell{Upon\\Discarding} & \makecell{Upon\\Processing}& Typical Work \\
    
    \midrule
    \makecell{GMP and\\GMPMW}  &\makecell{R: Unknown.\\C: Unknown.} & \makecell{R: Unknown.\\ C: Distribution.} & \makecell{R: Unknown.\\C: Distribution.} & \makecell{R: Value.\\C: Value.}& \citep{alaei2013online,alaei2014bayesian}\\  
    
    \midrule
    \makecell{OGAP}  & \makecell{R: Unknown.\\ C: Unknown.} & \makecell{R: Value.\\C: Value.} & \makecell{R: Value.\\C: Value.}& \makecell{R: Value.\\C: Value.}&\citep{liu2023online,li2023sample}\\
    
    \midrule
    \makecell{OSGAP}&\makecell{R: Distribution.\\C: Distribution.}&\makecell{R: Value.\\C: Distribution.}&\makecell{R: Value.\\C: Distribution.}& \makecell{R: Value.\\C: Value.}&\citep{alaei2013online,yoshinaga2023size}\\
      
    
    \bottomrule
  \end{tabular}
\end{table*}

In this section, we first introduce the origins and history of GMP, followed by an introduction to OGAP and OSGAP. We will also compare the observability of task reward and cost in these problems, which is summarized in Table~\ref{tab:survey1}. 

In GMP, which was originally introduced by~\citet{alaei2013online}, a decision maker needs to decide which task to process in a task sequence, in order to maximize the cumulative reward within the resource budget $K$.~\citet{alaei2013online} studied GMP and achieved a competitive ratio of $1-K^{-1/2}$. 
In GMP, task reward is unobservable: The decision-maker has no knowledge about task reward before or upon task arrival, and the task reward is revealed only after processing. Task cost has progressively improved observability: The decision maker has no knowledge about task cost before task arrival, then knows the distribution of task cost upon arrival, and knows the value of task cost only after processing it.
A special case of GMP with random $0$-$1$ cost is the Magician's Problem (MP), which was then studied by~\citet{alaei2014bayesian} with a competitive ratio of $1-(K+3)^{-1/2}$ achieved. Recently, \citet{srinivasan2022generalized} studied a variant of GMP considering scenarios with unknown distributions of resource consumption. Owing to the large-scale setting and the unique observability of tasks, GMP has found widespread application in various fields, including e-commerce~\citep{amil2022multi} and transportation~\citep{jiang2022approximation}. However, none of the above works can accommodate the multiple workers in GMPMW.

The unique task cost observability and the large-scale setting also enabled GMP to be adopted to tackle other online problems, such as OSGAP~\citep{alaei2013online,alaei2014bayesian}, which is also studied in the large-scale setting. In OSGAP, each task belongs to a type drawn from a known distribution. A task's type determines its reward and cost distribution, both known in advance. Therefore, by knowing the distribution of task types, the decision-maker also knows the distributions of both task reward and cost.
Upon arrival, the task's type is revealed, specifying the reward and the cost distribution. Recently, \citet{yoshinaga2023size} extended OSGAP to consider a more limited resource budget while maintaining the large-scale setting.~\citet{liu2023online} and~\citet{li2023sample} studied OGAP as an adversarial variant of OSGAP, where the reward and cost of each task are arranged by an adversary and are revealed only upon arrival.
However, as shown in Figure~\ref{fig:surveyApdx} and Table~\ref{tab:survey1}, despite the differing initial knowledge of task costs in OSGAP (distribution) and OGAP (none), both problems follow a similar observability progression: the exact task cost is fully revealed immediately upon arrival (OGAP) or upon processing (OSGAP), without an intermediate level. Consequently, none of these problems considers the full progression of task cost observability involving all three levels as in GMP and GMPMW.

\begin{table*}
\centering
  \caption{Task Observability of Rewards {\color{black}(R)} and Costs {\color{black}(C)} in OKP, OSKP, and BwK}
  \label{tab:survey2}
  \begin{tabular}{|l|c|c|c|c|p{5cm}|}  
    \toprule
    \makecell{Online\\Problems} & \makecell{Before\\Arrival} & \makecell{Upon\\ Arrival} & \makecell{Upon\\Discarding} & \makecell{Upon\\Processing}& Typical Work \\
    
    \midrule
    \makecell{OKP}&\makecell{R: Unknown.\\C: Unknown.}&\makecell{R: Value.\\C: Value.}&\makecell{R: Value.\\C: Value.}& \makecell{R: Value.\\C: Value.}&\citep{zhou2008budget,cao2020optimal}\\
    
    \midrule
    \makecell{OSKP\\(Stochastic \\Reward and\\Cost)}&\makecell{R: Distribution.\\C: Distribution.}&\makecell{R: Value.\\C: Value.}&\makecell{R: Value.\\C: Value.}& \makecell{R: Value.\\C: Value.}&\citep{papastavrou1996dynamic}\\
    
    \midrule
    \makecell{OSKP\\(Stochastic \\Reward)}&\makecell{R: Distribution.\\C: Value.}&\makecell{R: Value.\\C: Value.}&\makecell{R: Value.\\C: Value.}& \makecell{R: Value.\\C: Value.}&\citep{jiang2022tight}\\
    
    \midrule
    \makecell{OSKP\\(Stochastic \\Cost)}&\makecell{R: Value.\\C: Distribution.}&\makecell{R: Value.\\C: Value.}&\makecell{R: Value.\\C: Value.}& \makecell{R: Value.\\C: Value.}&\citep{dean2008approximating}\\

    \midrule
    \makecell{BwK-B} & \makecell{R: Unknown.\\C: Unknown.}&\makecell{R: Unknown.\\C: Unknown.}& \makecell{R: Unknown.\\C: Unknown.}& \makecell{R: Value.\\C: Value.}&\citep{badanidiyuru2018bandits,immorlica2022adversarial}\\

    \midrule
    \makecell{BwK-F} & \makecell{R: Unknown.\\C: Unknown}&\makecell{R: Unknown.\\C: Unknown.}& \makecell{R: Value.\\C: Value.}& \makecell{R: Value.\\C: Value.}&\citep{badanidiyuru2018bandits,immorlica2022adversarial}\\
    
    
    \bottomrule
  \end{tabular}
\end{table*}


\subsection{OKP, OSKP, and BwK}
In this section, we introduce OKP, OSKP, and BwK, and discuss the task observability in these problems, which is summarized in Table~\ref{tab:survey2}.
In
OKP~\citep{zhou2008budget,cao2020optimal}, the decision-maker also chooses from a sequence of tasks to process to maximize the reward accumulation within the cost budget. OKP~\citep{zhou2008budget,bockenhauer2014online} assumes that the decision-maker has no information on the reward and cost of tasks except the bounded ratio between task reward and cost, and the reward and cost of a task are fully revealed upon task arrival. A stochastic variant of OKP is the Online Stochastic Knapsack Problem (OSKP). In OSKP, task costs are either deterministic (with stochastic rewards) or stochastic (with deterministic or stochastic rewards). In the deterministic cost setting~\citep{jiang2022tight}, costs are known by the decision maker at the beginning, while in the stochastic cost setting~\citep{papastavrou1996dynamic,dean2008approximating}, the cost of each task follows a known distribution, with its exact value revealed upon task arrival. However, as shown in Figure~\ref{fig:surveyApdx} and Table~\ref{tab:survey2}, despite the differing initial knowledge of task costs in the stochastic-cost OSKP (distribution) and OKP (none), both problems follow the same observability progression: the exact task cost is fully revealed immediately upon arrival, without an intermediate level. Moreover, as shown in Table~\ref{tab:survey2}, in the stochastic-reward OSKP, when task costs are deterministic, the decision-maker knows the task costs even before task arrival.
Consequently, none of these problems considers the full progression of task cost observability involving all three levels as in GMP and GMPMW.

In BwK~\citep{badanidiyuru2018bandits,immorlica2022adversarial,dragobandits}, the decision-maker does not know a task's reward or cost even upon its arrival. Two feedback settings are considered: bandit feedback (BwK-B), where both values are revealed only after processing the task, and full feedback (BwK-F), where they are revealed after the task is either processed or discarded. However, as shown in Figure~\ref{fig:surveyApdx} and Table~\ref{tab:survey2}, both BwK-B and BwK-F start with no knowledge of task costs and follow a similar observability progression, where the exact task cost is directly revealed after the task is discarded (BwK-B) or processed (BwK-F). Consequently, none of these problems considers the full progression of task cost observability involving all three levels as in GMP and GMPMW.


\subsection{The Online Contention Resolution Scheme}

Recently,~\citet{feldman2021online} introduced the Online Contention Resolution Scheme (OCRS) as a rounding scheme for solving online submodular function optimization problems. The core of OCRS involves generating a fractional solution for a linear relaxation of the problem using available prior knowledge, followed by a stochastic and sequential rounding process to derive a feasible integer solution for the original online problem. OCRS has been found capable of solving the Online Bayesian Optimization Problem (BOP), including the Online Multi-Unit Prophet Inequality (OMuPI)~\citep{feldman2021online,jiang2022tight}, the Constrained Oblivious
Posted Price Mechanisms Problem (COPM)~\citep{chawla2010multi,feldman2021online}, and the Stochastic Probing Problem~\citep{gupta2013stochastic,feldman2021online}. 
In OMuPI, task rewards follow known distributions and are revealed upon arrival, with task costs known to the decision-maker at the beginning.
In both COPM and the Stochastic Probing Problem, task rewards follow known distributions and are revealed only after processing, with task costs known to the decision-maker at the beginning.
Later, ~\citet{jiang2022tight} adopted OCRS to solve a deterministic-cost OSKP. 
However, as shown in Figure~\ref{fig:surveyApdx} and Tables~\ref{tab:survey2}--\ref{tab:survey3}, all of these works consider deterministic task costs known to the decision-maker at the beginning. 
Moreover, the absence of prior knowledge about task rewards and costs distinguishes GMP and GMPMW from BOP, making it impossible to construct a meaningful linear relaxation in advance for GMP. Consequently, GMP and GMPMW require fundamentally different approaches from OCRS.

\begin{table*}
\centering
  \caption{Task Observability of Rewards {\color{black}(R)} and Costs {\color{black}(C)} in OMuPI, COPM, and the Stochastic Probing Problem}
  \label{tab:survey3}
  \begin{tabular}{|l|c|c|c|c|p{5cm}|}  
    \toprule
    \makecell{Online\\Problems} & \makecell{Before\\Arrival} & \makecell{Upon\\ Arrival} & \makecell{Upon\\Discarding} & \makecell{Upon\\Processing}& Typical Work \\
    
    \midrule
    \makecell{OMuPI}&\makecell{R: Distribution.\\C: Value.}&\makecell{R: Value.\\C: Value.}&\makecell{R: Value.\\C: Value.}& \makecell{R: Value.\\C: Value.}&\citep{feldman2021online,jiang2022tight}\\
    
    \midrule
    \makecell{COPM}&\makecell{R: Distribution.\\C: Value.}&\makecell{R: Distribution.\\C: Value.}&\makecell{R: Distribution.\\C: Value.}& \makecell{R: Value.\\C: Value.}&\citep{chawla2010multi,feldman2021online}\\

    \midrule
    \makecell{Stochastic\\Probing}&\makecell{R: Distribution.\\C: Value.}&\makecell{R: Distribution.\\C: Value.}&\makecell{R: Distribution.\\C: Value.}& \makecell{R: Value.\\C: Value.}&\citep{gupta2013stochastic,feldman2021online}\\
    
    
    
    \bottomrule
  \end{tabular}
\end{table*}

\subsection{Other Online Optimization Problems}\label{sec:A.B.1}

Further to Section~\ref{sec:rl}, here we summarize other online optimization problems that are less relevant to GMP and GMPMW.
The One-Way Trading Problem (OTP) involves continuous resource consumption with task reward arriving in an adversarial order, considering an infinite time horizon and resource consumption controlled by the decision maker~\citep{cao2020optimal,el2001optimal,lin2019competitive}. Compared with the OTP, GMP and GMPMW are distinct in their unknown and limited time horizon and uncertain resource consumption.
The Online Bipartite Matching Problem deals with the matching of online tasks and offline machines, where the time horizon is known to the decision maker~\citep{mehta2013online,dickerson2021allocation,ijcai2023p607}. 
The online Pandora's Box problem introduces a unique approach where the decision maker strategically reveals item values, in order to maximize the maximum value (rather than the sum value) of revealed items minus the cost of revealing, with no limit on total cost, controlled item arrival order, and known item number~\citep{boodaghians2020pandora,esfandiari2019online,gatmiry2024bandit}. These problems are structurally different from GMP and GMPMW.

