\section{Related Work} \label{sec: review}
\paragraph{Optimization Algorithms for MOO}
There has been a rising interest in MOO in deep learning, 
mostly in the context of multi-task learning.
%But a systematic study on the 
But most existing methods can not be applied to 
the general OPT-in-Pareto problem. %has been missing. 
%However, existing learning algorithms are not able to solve the general Opt-in-Pareto problem. 
A large body of recent works focus on improving non-convex optimization  for finding \emph{some} model in the Pareto set, 
but cannot search for a \emph{special} model satisfying a specific criterion %like our method 
\citep{chen2018gradnorm,kendall2018multi,NEURIPS2018_432aca3a,yu2020gradient,NEURIPS2020_16002f7a,Wu2020Understanding,fifty2020measuring,javaloy2021rotograd}. %\footnote{Their main goal is to improve the non-convex optimization of multitask learning.}.

\paragraph{Specific Instantiations of OPT-in-Pareto}
One previous work \citep{mahapatra2020multi} and two concurrent works \citep{kamani2021pareto,chen2021weighted} study specific instantiations of the general OPT-in-Pareto problem and thus are highly related to this paper. We give a detailed review. \citet{mahapatra2020multi} aims to search Pareto model that satisfies a constraint on the ratio between the different objectives, which can be viewed as OPT-in-Pareto problem when the criterion $F$ is a proper measure of constraint violation (i.e, the non-uniformity score defined in \citet{mahapatra2020multi}). EPO, the proposed algorithm in \citet{mahapatra2020multi} heavily relies on a special property of the ratio constraint problem: there always exists an updating direction that either gives Pareto improvement or reduces the constraint violation or both. However, a general OPT-in-Pareto problem does not have such nice property, making EPO only a specialized algorithm for the ratio constraint problem rather than a general OPT-in-Pareto problem. In section \ref{sec: subset application} we demonstrate that PNG is able to recover the functionality of EPO while being a more general algorithm for OPT-in-Pareto. \citep{kamani2021pareto} formulate the fairness learning as a MOO problem in which the accuracy and fairness measure are considered as the two objectives. It first proposes PDO, an algorithm that converges to Pareto stationary set by viewing MOO as a bi-level optimization (which is a standard MOO algorithm that does not solve any instance of OPT-in-Pareto) and then BP-PDO, an modification of PDO that seeks a Pareto model that satisfies the ratio-constraint considered in \citet{mahapatra2020multi}. Admittedly, it is possible to extend the BP-PDO for general OPT-in-Pareto problems but such extension is non-trivial: even for the special ratio-constraint problem, it is unclear what convergence and optimality guarantee BP-PDO has (only guarantee of PDO is given in \citet{kamani2021pareto}). In comparison, our PNG is shown to converge to the local optimum of OPT-in-Pareto problem. \citet{chen2021weighted} aims to pre-train a multi-task model such that the representations of the tasks are similar. Their problem is essentially an OPT-in-Pareto problem where the discrepancy of task representations are chosen as the criterion function. Compared with PNG, the proposed TAWT algorithm requires the computation of inverse Hessian product at each iteration making its computational cost large.

\paragraph{Approximation of Pareto Set}
There has been increasing interest in
finding a compact approximation of the Pareto set. \citet{navon2020learning, lin2020controllable} use hypernetworks to approximate the map from linear scalarization weights to the corresponding Pareto solutions; these methods could not fully profile non-convex Pareto fronts due to the limitation of linear scalarization \citep{boyd2004convex}, and the use of hypernetwork introduces extra optimization difficulty. 
Another line of works \citep{lin2019pareto,mahapatra2020multi} approximate  the Pareto set by Pareto models with different user preference vectors that rank the relative importance of different tasks; these methods need a good heuristic design of preference vectors, which requires prior knowledge of the Pareto front. 
\citet{ma2020efficient} leverages  manifold gradient to conduct a local random walk on the Pareto set but suffers from the high computational cost. \citet{deist2021multi} approximates the Pareto set by maximizing hypervolume, which also requires prior knowledge for a careful choice of good reference vector. \citet{liu2021profiling} introduces a repulsive force to encourage the model diversity without hurting their Pareto Optimality.

\paragraph{Applications of MOO}
Multi-task learning can also be applied to improve the learning in many other domains including domain generalization \citep{dou2019domain,Carlucci_2019_CVPR,albuquerque2020improving}, domain adaption \citep{sun2019unsupervised,luo2021learnable}, model uncertainty \citep{NEURIPS2019_a2b15837,zhang2020auxiliary,xie2021innout}, adversarial robustness \citep{yang2020multitask} and semi-supervised learning \citep{NEURIPS2020_06964dce}. All of those applications utilize a linear scalarization to combine the multiple objectives and it is thus interesting to apply the proposed OPT-in-Pareto framework, which we leave for future work. 

