\medskip

\noindent
{\bf Previous Related Work.}
 Multiple instance learning (MIL), specifically its classification setting, was proposed by \cite{DLL97} to model drug activity detection where the bag-label is an {\sf OR} of its (unknown) instance-labels (all labels are $\{0,1\}$-valued), with the goal being to train an instance-label classifier. MIL has subsequently been used in various other applications such as medical image~\citep{WYY15} and videos~\citep{SDB13} analysis,  time series prediction~\citep{M98}, and information retrieval~\citep{LY00}.
 
In MIR i.e., multiple instance regression, introduced by \cite{RP01}, the underlying task is regression over the real-valued labels. For each bag, the label of a \emph{primary} instance from it is its bag-label. % problem has not received as much attention as MIL. 
The earliest applications of MIR formulations have been in remote sensing such as aerosol optical depth prediction~\citep{WRHOV08} and crop yield prediction~\citep{WL07}. More recently, for applications like assessing image quality depending on that of a constituent prime image, \cite{Liangetal21} modeled the problem as MIR to develop model training methods. Another image analysis task of facial age estimation has also been studied in the work of \cite{Liu2019WitnessDI} using MIR techniques while MIR has also recently been used to model the continuous response of bags of neoantigens~\citep{Parketal22}. Other applications of MIR are possible in user modeling for online advertising, where due to privacy considerations, an online purchase or conversion event cannot be linked to a unique user clicks, rather we have a subset or bag of clicks which could have resulted in the conversion (see Section 2.1 of \citep{o2022challenges}).

Loss based  methods which transform the problem into instance-level regression include  {\sf Aggregated-MIR} which assigns the average feature-vectors in each bag the bag-label, and {\sf Instance-MIR} in which the bag-label is assigned to each instance in a bag (see \cite{WRHOV08}). More sophisticated EM based methods are primary-MIR (PIR)~\citep{RP01}, pruning  MIR~\citep{WRHOV08} and mixture-model MIR~\cite{WLV7}, while \cite{WLR08,TF18} proposed clustering based methods for MIR. However, the work of \cite{KSABGR} is (to the best of our knowledge) the first that investigated in detail the learning theoretic aspects of MIR, showing (i) error bounds for generalizing regressors trained on randomly sampled bags with iid feature-vectors to the underlying feature-vector distribution, and (ii) the NP-hardness of even approximately optimizing a linear regressor on arbitrary bag distributions. Additionally \cite{KSABGR} provided an optimization based model training approach for the MIR problem, albeit without any performance guarantees.
 
 