\subsection{Overview of Techniques}\label{sec:overview}
{\bf Impossibility of Boosting in LLP} (Theorem \ref{thm:LLP-imposssibility}). Our construction follows from  the well-known \textit{semi-definite programming} (SDP) integrality gap of \citet{Feige2002} for the Max-Cut problem. In this, for some arbitrarily small $\eps > 0$, with $d$ depending on $\eps$, the vertices of the graph are given by points on the $(d-1)$-dimensional unit sphere $\mathbb{S}^{d-1}$. For any constant $\alpha \in [1/2, 1)$, each edge is between points that are at an angle of at least $\alpha\pi$. Using techniques related to spherical isoperimetry and concentration of measure in high dimensions, the authors prove that there is no cut in the graph separating more than $(\alpha + \eps)$-fraction of the edges. By creating a $2$-sized bag corresponding to each edge with latter's two end-points being the bag's two feature-vectors, we create a collection of bags, and for each one we assign an aggregate label $1$ i.e., any bag is satisfied if exactly one of its feature-vectors is labeled $1$ or equivalently the corresponding edge is separated. The cut upper bound of $(\alpha + \eps)$ thus directly gives us the upper bound on the best possible accuracy of any classifier. On the other hand, since the angle between the feature-vectors of any edge is at least $\alpha\pi$, a random halfspace passing through the origin -- given by ${\sf pos}\left(\br^{\sf T}\bx\right)$ for a random unit vector $\br$ -- has expected accuracy $\alpha$ for any weight assignment to the bags, and therefore there is some halfspace achieving accuracy $\alpha$.

\noindent
{\bf Impossibility of Boosting in MIL} (Theorem \ref{thm:MIL-imposssibility}). Since the aggregation function is ${\sf OR}$ the Max-Cut construction of \cite{Feige2002} is not applicable. Instead we hand-craft the set of bags as follows. The set of feature-vectors is all points on the unit circle $\mathbb{S}^1$  and for some $\alpha \in (1/2, 1)$, we create a bag with two points if the angle between them is exactly $\alpha \pi$ and give an aggregate label $1$ to all such two sized bags (let us call them $1$-bags). We also construct $2$-sized bags with aggregate label $0$ when the angle between two points is exactly $(1 - \alpha)\pi$ (called as $0$-bags). %
If we consider any reweighted collection of these bags then %
a simple threshold based case-analysis yields weak classifier of accuracy $2/3 - (1-\alpha)/2$. To rule out any strong classifier, we consider a labeling where $z$-fraction of the points in $\mathbb{S}^1$ are labeled as 1. We show that the maximum accuracy possible is 3/4 which is achieved at $z = 1/2$. We choose $\alpha = 1 - \eps$  while losing an additional error of $\eps/2$ in the weak-classifier accuracy due to discretization to obtain the desired bounds. 

\noindent
{\bf Weak to Strong LLP Learning} (Theorem \ref{thm:weaktostrong}). The main idea is, given a collection of original bags $\mc{B}$, to construct all possible composite bags which are unions of up to $t$ bags from $\mc{B}$. Note that the aggregate label for the union is simply the sum of the aggregate labels of the constituent bags, and the error of a classifier w.r.t. the aggregate label on the union of bags is the sum of errors on the constituent bags. Let $f$ be a classifier with accuracy $\gamma > 0$ on the composite bags, and assume for a contradiction that $f$ has accuracy less than $(1- \eps)$ on $\mc{B}$, for some $\eps > 0$. Call the bags in $\mc{B}$ on which $f$ has a non-zero error  $\in \mathbb{Z}\setminus\{0\}$ w.r.t. the aggregate label, as the \emph{error} bags. Now, if $t$ is large enough then a random set of $t$ bags from $\mc{B}$ has, with high probability $\approx \eps t$ error bags. Using a sampling argument we show that the error on the union of $t$ random bags from $\mc{B}$ is distributed like a random Bernoulli combination of the errors on $\approx 2\eps t$ bags. We then apply the Littlewood-Offord-Erd\H{o}s anti-concentration lemma to obtain that with probability at least $(1 - O(1/(\sqrt{\eps t}))$, the union of the bags has non-zero error induced by $f$. By choosing $t$ large enough we obtain a contradiction with the accuracy of $\alpha$ on the composite bags. Standard sampling techniques can be applied to obtain a more efficient procedure with high probability guarantees.

We also note here that the above algorithmic techniques are inapplicable to the MIL setting (see Appendix \ref{sec:mil-limit}). In Appendix \ref{sec:multiclass} we informally describe how are results and techniques can be applied to multi-class classification settings of LLP and MIL, in which the aggregate label of a bag is a histogram over the label-set. 

