
\section{Trivial Accuracy in the LLP and MIL} \label{app:trivial_performance}



First consider the LLP bags $\mc{B}$ from Theorem \ref{thm:LLP-imposssibility}, each bag is of size $2$ with aggregate label $1$ i.e., it is satisfied if exactly one of its feature-vectors is labeled $1$. Now consider just one bag from $\mc{B}$. This bag is not satisfied by the constant $0$ or constant $1$ classifier. On the other hand the expected accuracy of random labeling is $1/2$, and therefore ${\sf Trv}_{\sf LLP}(\mc{B}) = 1/2$.

Next, let $\mc{B}$ be the MIL bags from Theorem \ref{thm:MIL-imposssibility}. These are bags of size $2$ each and some of them have aggregate label $0$ and some have aggregate label $1$. Consider just two bags one with aggregate label $0$ and the other with aggregate label $1$. Now, the constant $a$ labeling satisfies the bag with aggregate label $a$ and does not satisfy the bag with aggregate label $(1-a)$, for $a \in \{0,1\}$. On the other hand the random labeling satisfies the $0$ aggregate label bag with probability $1/4$ and the bag with aggregate label $1$ with probability $3/4$. Thus, the expected number of bags satisfied is the random labeling is $1$. Therefore,  ${\sf Trv}_{\sf MIL}(\mc{B}) = 1/2$.

\subsection{Inapplicability of bag composition for weak to strong learning in MIL} \label{sec:mil-limit}
Suppose we are given a classifier on the original small bags with accuracy bounded away from $1$. In LLP, when we form composite bags, each as a union of several randomly chosen original bags, we obtain an erroneous prediction on most of the composite bags thus formed. Our LLP algorithm uses this error-gap amplification. However, in MIL, the union of several randomly chosen original bags will give a bag-label of $1$  even if one of the constituent original bags has bag-label $1$. This would happen with high probability when a significant number of original bags have bag-label $1$. Thus, taking large unions would result in most bags having bag-label $1$ and therefore the constant predictor will have a high accuracy, even if it has low accuracy on the original bags. 

\subsection{Extension of our results to multi-class classification} \label{sec:multiclass}
Our results on the impossibility of boosting (Theorems \ref{thm:LLP-imposssibility}, \ref{thm:MIL-imposssibility}) rule out boosting in LLP and MIL for binary classification and since this is a special case of the multi-class setting, they also rule out boosting in the multi-class setting.
Our algorithmic results (Theorem \ref{thm:weaktostrong}) are also for binary classification. However, they can be extended to multi-class classification. For this we can define LLP in the multi-class setting, where the bag-label is a histogram over label-set, and a bag is satisfied if the predicted label histogram matches its bag-label. The algorithms will be the same, up to change in parametric dependencies on the label-set size. The application of Lemma \ref{lemma:littlewood_offord} can be done separately for each label along with a union bound over the error probability.


\input{result_mil_impossibility_mil}




\section{Weighted  bags to unweighted bags} 
\label{app:wtdtounwtd}
\begin{figure}[!htb]
\begin{mdframed}
\small
\textbf{Input:} : Bags $\mc{B}_w = (B_i, w_i)_{i=1}^m$, $T$.\\
\\
\textbf{Steps:}
\\ \textbf{1.} Normalize the weight with a factor $Z$ such that $\sum_{i=1}^m w_i = m$.
\\ \textbf{2.} Define $\mc{B}$ to be the unweighted collection of bags and initialize it to $\emptyset$.
\\ \textbf{3.} for  $i \in [m]$: \\
\hspace*{2em} \textbf{3.1} Define $n_i =  \lceil{w_i(T-1)}\rceil$.\\
\hspace*{2em} \textbf{3.2} Add $n_i$ copies of $B_i$ to $\mc{B}$.\\
\\
\textbf{Output:} Output $\mc{B}$.
\end{mdframed}
\caption{Weighted to unweighted collection of bags}\label{fig:wtdtounwtd}
\end{figure}
The algorithm to convert a weighted collection of bags to an unweighted collection is given in Fig. \ref{fig:wtdtounwtd}. First, observe that $|\mc{B}| = \sum_{i=1}^m \lceil{w_i(T-1)}\rceil \leq \sum_{i=1}^m (w_i(T-1)+1) \leq (T-1)m + m = Tm$, where we use $\sum_{i=1}^m w_i = m$. On the other hand, $|\mc{B}| =  \sum_{i=1}^m \lceil{w_i(T-1)}\rceil \geq (T-1)m$. 

Now, to see that the error in accuracy is at most $O(1/T)$, observe that for any subset $I \subseteq [m]$,  $\sum_{i\in I} {w_i(T-1)} \leq \sum_{i\in I} \lceil{w_i(T-1)}\rceil \leq \sum_{i\in I} {w_i(T-1)} + |I|$. Therefore, the normalized error in the weight corresponding to $I$ is at most $|I|/((T-1)m) \leq m/((T-1)m) \leq 1/(T-1) = O(1/T)$ for $T > 1$. 

\section{Probabilities for the support of $\ol{D}$}\label{sec:suppD}
In Step 2 of Figure \ref{algo:DistnDbar}, the for a fixed configuration $\{\mc{Q}_i\}_{i=1}^t$ with $r : |\{ i \in [t]\,\mid\, \mc{Q}_i = \star\}|$, its probability under $\ol{D}$ is $\frac{m^r}{m^t}\frac{1}{2^r}$, since the number of choices for the $\star$-coordinates is $m^r$, while the total number of choices is $m^t$. Further, with $(1/2)^t$ probability we have the specific choices of the $r$ coordinates with $\star$ in Step 2. Iterating over all possible configurations   $\{\mc{Q}_i\}_{i=1}^t$ and assigning theur probabilities to the resultant $(\ol{B},\ol{\sigma})$ in Step 3, yields the support of $\ol{D}$ along with their probabilities.

\section{Additional Experiments}\label{sec:additional_expts}
In Table \ref{tab:training_on_small_bags}, we report results obtained by training directly on the original small bags. When comparing these results with those in Tables \ref{tab:table1-appendix} and \ref{tab:table2-appendix}, we find that training directly on original bags yields better accuracy on the test sets of original bags and individual instances for the Heart dataset and comparable performance on the Australian and Adult datasets. For both Synthetic datasets, however, the strong classifier obtained using our proposed algorithm achieves better performance on original bags compared to direct training.


\begin{table}
\centering
\begin{tabular}{rrr}
\toprule
$q$ &   Train Bags & Test Instances   \\
\midrule
 &   \textit{Heart} &  \\
5 & $46.370 \pm 5.871$ & $82.578 \pm 4.767$ \\
15 & $31.111 \pm 12.258$ & $74.844 \pm 5.960$ \\
\midrule
 &  \textit{Australian} &  \\
 5 & $55.064 \pm 6.881$ & $84.196 \pm 3.815$ \\
15 & $26.190 \pm 6.000$ &  $77.121 \pm 5.776$ \\
\midrule
 &  \textit{Adult} &  \\
 5 & $47.368 \pm 0.650$ & $83.539 \pm 0.588$  \\
15 & $12.899 \pm 1.762$ & $80.119 \pm 2.172$ \\
\midrule
 &  \textit{Synthetic Random} &  \\
 5 & $81.783 \pm 2.369$ & $95.627 \pm 0.736$  \\
15 & $42.400 \pm 3.795$ & $88.535 \pm 2.357$ \\
\midrule
 &  \textit{Synthetic Hard} &  \\
5 & $74.546 \pm 6.029$ & $92.170 \pm 2.506$  \\
15 & $32.313 \pm 4.112$ & $82.592 \pm 5.045$  \\
\bottomrule
\end{tabular}
\caption{Results after training directly on orginal (small) bags.}
\label{tab:training_on_small_bags}
\end{table}


