
\documentclass[10pt]{article} % For LaTeX2e
%\usepackage{tmlr}
\usepackage[accepted]{tmlr}
% If accepted, instead use the following line for the camera-ready submission:
%\usepackage[accepted]{tmlr}
% To de-anonymize and remove mentions to TMLR (for example for posting to preprint servers), instead use the following:
%\usepackage[preprint]{tmlr}

% Optional math commands from https://github.com/goodfeli/dlbook_notation.
\input{math_commands.tex}

\usepackage{amsmath,amsfonts,amssymb}
\usepackage{amsthm}

\let\AND\relax
\usepackage{algorithmic}
%\usepackage{algpseudocode} % replaces algorithmic
\usepackage{algorithm}

\usepackage{array}
\usepackage[caption=false,font=normalsize,labelfont=sf,textfont=sf]{subfig}
\usepackage{textcomp}
\usepackage{stfloats}
\usepackage{url}
\usepackage{graphicx}
\usepackage{cite}
\usepackage{xcolor}
\usepackage{booktabs}
\usepackage{multirow}
\usepackage{tikz}
\usepackage{pgfplots}
\usepackage{enumitem}
\usepackage{comment}
\usepackage{cite}

\usepackage{subcaption}
%\usepackage{subfigure}


\pgfplotsset{compat=newest}

% Theorem environments
\newtheorem{theorem}{Theorem}
\newtheorem{lemma}{Lemma}
\newtheorem{definition}{Definition}
\newtheorem{remark}{Remark}
\newtheorem{example}{Example}
\newtheorem{proposition}{Proposition}

\usepackage{hyperref}
\usepackage{url}


\title{Stochastic Multi-Objective Multi-Armed Bandits: Regret Definition and Algorithm}

% Authors must not appear in the submitted version. They should be hidden
% as long as the tmlr package is used without the [accepted] or [preprint] options.
% Non-anonymous submissions will be rejected without review.

\author{\name Mansoor Davoodi \email Mansoor.DavoodiMonfared@here.com \\
      \addr $^1$ Faculty of Electrical Engineering and Information Technology, 
      Ruhr-University Bochum, Bochum 44801, Germany\\
      $^2$ Department of Computer Science and Information Technology, Institute for Advanced Studies in Basic Sciences, Prof. Yousef Sobouti Blvd. 444, 45137-66731, Zanjan, Iran\\
      $^3$ HERE Deutschland GmbH Co. KG, Invalidenstraße 116, 10115 Berlin, Germany\\
      \AND
      \name Setareh Maghsudi \email Setareh.Maghsudi@ruhr-uni-bochum.de \\
      \addr Faculty of Electrical Engineering and Information Technology\\
      Ruhr-University Bochum, Bochum 44801, Germany
      }

% The \author macro works with any number of authors. Use \AND 
% to separate the names and addresses of multiple authors.

\newcommand{\fix}{\marginpar{FIX}}
\newcommand{\new}{\marginpar{NEW}}

\def\month{02}  % Insert correct month for camera-ready version
\def\year{2026} % Insert correct year for camera-ready version
\def\openreview{\url{https://openreview.net/forum?id=7N7sK5CFuP}} % Insert correct link to OpenReview for camera-ready version


\begin{document}


\maketitle

\begin{abstract}
Multi-armed bandit (MAB) problems are widely applied to online optimization tasks that require balancing exploration and exploitation. In practical scenarios, these tasks often involve multiple conflicting objectives, giving rise to multi-objective multi-armed bandits (MO-MAB). Existing MO-MAB approaches predominantly rely on the Pareto regret metric introduced in \citet{drugan2013designing}. However, this metric has notable limitations, particularly in accounting for all Pareto-optimal arms simultaneously. To address these challenges, we propose a novel and comprehensive regret metric that ensures balanced performance across conflicting objectives. Additionally, we introduce the concept of \textit{Efficient Pareto-Optimal} arms, which are specifically designed for online optimization. Based on our new metric, we develop a two-phase MO-MAB algorithm that achieves sublinear regret for both Pareto-optimal and efficient Pareto-optimal arms.
\end{abstract}



%==========================================
%==========================================
\input{Introduction}
\input{RelatedWork}
\input{LitRegretAnalysis}
\input{RegretDef}
\input{Algorithm}
\input{Experiment}
%==========================================
%==========================================


\section{Conclusion}
This paper introduced a novel regret metric for multi-objective multi-armed bandits (MO-MAB) designed to overcome the limitations of existing metrics by comprehensively accounting for all objectives simultaneously. We further defined the concept of efficient Pareto-optimal arms as those residing on the convex hull of the Pareto-optimal front. Leveraging this metric, we proposed a new algorithm proven to achieve sublinear regret for both Pareto-optimal and efficient Pareto-optimal arms in stochastic environments.

While this work presents the first algorithm rigorously evaluated under the proposed regret framework, our analysis reveals limitations in its theoretical regret bound and computational complexity. Future research should prioritize developing algorithms that achieve tighter regret guarantees, potentially through alternative algorithmic designs or refined analytical techniques. Moreover, the regret framework established here provides a formal foundation for extending the analysis to the significantly more challenging setting of adversarial MO-MAB, making the development of efficient algorithms robust to non-stochastic rewards a critical and promising direction for advancement in multi-objective online decision-making.



\bibliography{main}
\bibliographystyle{tmlr}

\appendix
\input{Appendix}


\end{document}
