\section{Analysis of Pareto Regret Used in the Literature}
\label{Drugan regret}

In 2013, Drugan et al. \citeyear{drugan2013designing} introduced the first regret metric for evaluating the efficiency of MO-MAB algorithms, known as \textit{Pareto regret}. This metric quantifies, for each arm \( a \), the distance between its reward vector \(\mu(a)\) and the Pareto-optimal (PO) set \(\mathcal{A}^*\). To do this, a \textit{virtual reward vector} \( \nu(a)^* \) is constructed by adding a value \(\epsilon\) to each objective of \( a \) until it becomes \textit{incomparable} (non-dominating) with any arm in \(\mathcal{A}^*\). The regret \( \Delta(a) \) is then defined as the difference between \( \nu(a)^* \) and the original reward vector \(\mu(a)\), effectively representing the minimum distance to the Pareto front.
While this metric is simple and widely adopted, it has several notable limitations. The most significant issue is that it evaluates arms based solely on their distance to the Pareto front in one direction, without considering their performance across all objectives. This can lead to situations where algorithms optimizing a single objective achieve low regret even if they perform poorly on other objectives. For example, algorithms designed to prioritize one objective (e.g., see \citet{xu2023pareto}) may appear effective under this metric despite failing to achieve a well-balanced multi-objective performance.

Figure \ref{DruganCounterExample} (left panel) illustrates this problem in a MO-MAB instance with three arms. Here, the Pareto UCB algorithm presented in \citet{drugan2013designing} repeatedly (and fairly) selects the suboptimal arm with reward \( (0,0) \), despite it being strictly dominated by arms \( (1,0) \) and \( (0,1) \). Indeed, the algorithm only does not select this dominated arm in the second round. Although the regret bound \(\sum_{a \notin A^*} \frac{8 \cdot \log \left( T \sqrt[4]{D |A^*|} \right)} {\Delta_a}\), where $T$ is the number of plays and $D$ is the number of objectives, claimed by \citet{drugan2013designing} holds, the choice of \(\Delta_a\) as a small positive value can lead to poor practical performance. This issue is compounded in higher-dimensional settings, where the number of suboptimal arms grows exponentially with the number of objectives, dramatically reducing the probability of selecting even a PO arm in each round.

Additionally, Drugan et al. introduced a \textit{fairness} concept to encourage more balanced performance across the PO arms. However, this metric assumes a uniform distribution of PO solutions along the Pareto front, a condition that is rarely met in real-world problems. For example, Figure \ref{DruganCounterExample} (right panel) shows a non-uniform Pareto front where fairness fails to ensure diversity. When the algorithm plays "fairly," as defined by \citet{drugan2013designing}, it may repeatedly select arms clustered near one extreme of the Pareto front, neglecting other promising regions.

These challenges highlight the shortcomings of the existing Pareto regret metric. Although it provides useful insights in certain cases, it does not adequately capture the trade-offs inherent in MO-MAB problems. Therefore, there is a clear need for a more comprehensive regret metric that evaluates algorithm performance holistically, accounting for both balance across objectives and diversity along the Pareto front.



\begin{figure}[h]
    \centering
    \subfloat[]{
        \begin{tikzpicture}[scale=1.5]
            \draw[->] (-0.3,0) -- (1.2,0) node[right] {$f_1$};
            \draw[->] (0,-0.3) -- (0,1.2) node[above] {$f_2$};
            
            \filldraw[black] (0,0) circle (2pt) node[below left] {$(0,0)$};
            \filldraw[black] (1,0) circle (2pt) node[below] {$(1,0)$};
            \filldraw[black] (0,1) circle (2pt) node[left] {$(0,1)$};
            
            \draw[blue, thick] (1,0) -- (0,1);
        \end{tikzpicture}
        \label{example3points}
    }
    \hspace{1cm}
    \subfloat[]{
        \begin{tikzpicture}[scale=0.4]
            \draw[->] (-0.2,0) -- (4.5,0) node[right] {$f_1$};
            \draw[->] (0,-0.2) -- (0,4.5) node[above] {$f_2$};
            
            \draw[thin,domain=0:4,samples=100,smooth,variable=\x] 
                plot ({\x},{4-\x^2/4});
            
            \foreach \x in {0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1} {
                \filldraw[black] (\x,{4-\x^2/4}) circle (1.5pt);
            }
            
            \filldraw[black] (2.8,{4-2.8^2/4}) circle (1.5pt);
            \filldraw[black] (4,{4-4^2/4}) circle (1.5pt);
        \end{tikzpicture}
        \label{nondiverse_pareto_front}
    }
    \caption{(a) MO-MAB instance with three arms: (1,0), (0,1), and (0,0). (b) PO solutions in a bi-objective maximization problem, not uniformly spread.}
    \label{DruganCounterExample}
\end{figure}

