

{\color{black}\section{Evaluation of OWA by Case Study}\label{sec:exp}}

To further evaluate the performance of OWA, we conduct trace-driven experiments under a practical system of real-time video analytics at the Internet edge with multiple deployed machine learning models.


\subsection{Model Selection for Real-Time Video Analytics}

\begin{figure}[t]
    \centering
    \includegraphics[width=0.8\linewidth]{Figures/202401Fig.drawio.pdf}
    \caption{Video analytics and accuracy calculation.}
    \label{mfig:expreal}
\end{figure}

Real-time video analysis requires sophisticated machine learning models, but the edge devices, such as tablets and laptops, are equipped with limited batteries.
 In many situations, the edge devices are not connected to a persistent power supply, so it is important to efficiently manage the workload on the edge devices. 
 Here we consider a general scenario where an edge computing device, equipped with multiple machine learning models, is used for real-time video analytics, processing a sequence of video chunks in practical applications such as traffic monitoring in smart cities and hazard prevention. Different machine learning models (workers) generate different accuracy values (reward) and consume different amounts of energy (resource). We need to decide whether to process or discard a video chunk and, if so, which machine learning model to use, in order to maximize the overall accuracy within the energy constraint.

This application scenario of real-time video analysis with multiple models deployed on the edge device is consistent with the problem formulation of the GMPMW. The video chunks are the incoming tasks, and the multiple machine learning models to process each video chunk are the multiple workers in the GMPMW. The probability distribution of the energy consumption of each model $\mathcal{R}_{t,l}$ can be estimated by model profiling and video pre-processing~\cite{hung2018videoedge}, and the upper bound $\overline{u}_{l}$ and lower bound $\underline{u}_l$ can also be obtained by profiling~\cite{zhang2017live}. The edge device does not know the accuracy and energy consumption of processing each video chunk before that chunk arrives, and only after processing a chunk using a model can it know the corresponding accuracy and energy consumption. The energy budget of the edge device is the limited resource budget. 

\subsection{Video Trace Collection and Model Profiling}

For the video traces, we use a Xiaomi 12 Pro Android smartphone equipped with a Sony IMX766 photosensor to capture the traffic on the road as well as the network condition. We collect $4$ sets of video traces with different lengths, labeled as Trace $1$--$4$. The video frames are grouped into video chunks, with each video chunk containing the video frames in 3 seconds. The video chunks are then sent to the edge device for analysis, so each video chunk is a task.

We deploy $3$ machine learning models on a laptop computer powered by an Intel Core i5-11320H CPU with integrated graphics to analyze the video chunks. The machine learning models deployed are Faster R-CNN~\cite{ren2015faster}, YOLOv5~\cite{yolov5} with medium backbone, and YOLOv5 with large backbone. To obtain the power consumption of each model, we use the Python implementation of Intel's Running Average Power Limit (pyRAPL) \cite{pyrapl}. The video frames are grouped into video chunks, each containing 3 seconds of video frames. The video chunks are then sent to the edge device for analysis, with each video chunk being a task.

We obtain the accuracy of each model by model profiling~\cite{zhang2017live}, and we adopt an accuracy criterion based on the Intersection over Union metric, which is widely accepted in video analytics~\cite{lin2014microsoft, yolopaper, ren2015faster}. In Figure~\ref{mfig:expreal}, each inference generated by an object detection model yields a set of predicted bounding boxes (red boxes). In parallel, we construct a set of ground truth objects, each represented by its own bounding box (green boxes). These ground truth bounding boxes are derived from a highly reliable model, the Faster R-CNN \cite{ren2015faster}, which uses the Resnet 50 \cite{he2016deep} backbone. The accuracy of our model is determined by comparing these two sets of bounding boxes.

We profile the accuracy and the energy consumption of each model in the same environment. The accuracy and energy measurements are executed once per second and averaged over each 3-second video chunk. Further details are given in Appendix~\ref{sec:A.D.2}.

\subsection{Benchmarks} 

We compare the performance of OWA against that of the following
benchmarks (with further details on each of them given in Appendix~\ref{sec:A.D.2}): (i) R: the Random algorithm randomly decides whether to process a video chunk with a model or to discard it; (ii) Ada: the Adaptive algorithm selects a random model to process a new chunk if it has previously discarded any chunk; (iii) GOK: the Greedy Online Knapsack algorithm processes every video chunk using the model with the highest efficiency; (iv) EAE: Exploration and Exploitation~\citep{audibert2009exploration}; (v) UCB: Upper Confidence Bound Bandit~\citep{garivier2011upper}; (vi) MOT: Multi-worker One-way Trading~\citep{cao2020optimal}; (vii) MPC: Model Predictive Control~\citep{morari1999model}; (viii) S-OWA: Single-Worker OWA implements a single-worker version of OWA; (ix) A-OWA: Average OWA first selects a fixed model and then decides whether to process the video chunks. Further details on each of the benchmarks are given in Appendix~\ref{sec:A.D.2}.

 \begin{figure}[t]
    \centering
    \begin{subfigure}{0.5\linewidth}
        \centering
        \includegraphics[width=\linewidth]{Figures/Set2-1.pdf}
        \caption{Trace $1$ ($1200$s).}
        \label{mfig:exp2a}
    \end{subfigure}\hspace{-3.1mm} % Adjust spacing as needed
    \begin{subfigure}{0.5\linewidth}
        \centering
        \includegraphics[width=\linewidth]{Figures/Set2-2.pdf}
        \caption{Trace $2$ ($1800$s).}
        \label{mfig:exp2b}
    \end{subfigure}
    
    \begin{subfigure}{0.5\linewidth}
        \centering
        \includegraphics[width=\linewidth]{Figures/Set2-3.pdf}
        \caption{Trace $3$ ($2400$s).}
        \label{mfig:exp2c}
    \end{subfigure}\hspace{-3.1mm} % Adjust spacing as needed
    \begin{subfigure}{0.5\linewidth}
        \centering
        \includegraphics[width=\linewidth]{Figures/Set2-4.pdf}
        \caption{Trace $4$ ($3000$s).}
        \label{mfig:exp2d}
    \end{subfigure}
    % \vspace{-4.5mm}
    \caption{Comparing OWA against benchmarks.}
    \label{mfig:exp2}
\end{figure}

\subsection{Performance}

To compare the performance of the OWA algorithm against that of the benchmarks, we apply all algorithms to each video trace with different resource budgets $K$, as shown in Figure~\ref{mfig:exp2}. 


In all settings, we observe that the OWA algorithm outperforms all benchmarks under all conditions on our video traces.
We also have some observations on the performance of each of the benchmarks. We will discuss them in three groups: \textcircled{1} Random, Adaptive, and GOK; \textcircled{2} EAE, UCB, MOT, and MPC; and \textcircled{3} S-OWA and A-OWA.

The Random algorithm performs worse than OWA because it does not consider the resource constraints. The adaptive algorithm considers the resource constraint, but simply controls the number of tasks (chunks) processed, rather than the resource consumption, so it performs worse than OWA. The GOK algorithm sticks to the worker (model) with the highest average efficiency, but the most efficient worker may not fully utilize all of the resource (energy), leading to inferior performance.

EAE and UCB perform worse than OWA because they are not aware of the resource constraint. MOT performs worse than the OWA algorithm because, in our system, the reward (accuracy) and the resource consumption are not known when a video chunk arrives. However, these are important variables for the decision-making process in the MOT algorithm. When MOT can only use the profiled data to make decisions, its performance suffers. MPC performs worse than the OWA algorithm because the real-world street scenes captured in our video traces are highly fluctuating. As a result, the predictions made by MPC are not accurate.

Finally, we discuss S-OWA and A-OWA, which are different variants of OWA. S-OWA focuses only on the best possible worker (model) according to model profiling. It performs worse than OWA because it does not balance between the reward and resource consumption by utilizing all workers. 
On the other hand, A-OWA first chooses the worker (model) and then makes a decision based on that worker's baseline. This strategy of A-OWA leads to worse performance than OWA because each worker only updates its own baseline, but the different reward and resource consumption levels among different workers are coupled in GMPMW. In contrast, OWA updates the baselines in a joint manner and assigns each task to different workers based on the model profiling result, thus achieving superior performance. 

In conclusion, our experimental results demonstrate the excellent capability of the OWA algorithm to utilize the multiple workers in GMPMW in a variety of realistic system settings, and they show the importance of properly handling the multiple workers in the proposed approach.
