\appendix
\onecolumn
{
%\nouppercaseheads

\section*{\centering \large{Supplementary: \papertitle}}

\iffalse%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Related Works (Detailed)}
\label{app:rel}
Ranking from pairwise comparisons has been studied extensively in various disciplines owning to its huge practical importance, reviewing all lies beyond the scope of this work. We review only the works most relevant to our setting. The most related work is 
\cite{niranjanRa17}, however, they assume the features to lie in some low dimensional space and use a matrix completion-based approach to predict the ranking. Note that the low-rank assumption is a \emph{global} assumption on the features that might miss out completely on the exact dependencies on the items. 
\cite{chiang+17} also consider a feature preference information model, but do not analyze the graph theoretic aspects of feature dependencies.% and their guarantees are again w.r.t the feature dimension $d$, whereas ours is in terms of number of 'independent items' $\alpha$ and thus much tighter when $d >> \alpha$. %Furthermore, our graph theoretic analysis also captures the interaction between the I(G) and non-I(G) nodes explicitly (through Thm 3.2), which they cannot.
%As we will see, the set of features in a low dimensional space might give rise to very different type of relation graphs which may lead to very different sample complexity bounds that our analysis will capture while theirs does not.   
\cite{gleichLek11}, \cite{borkarNi16} also use a least squares-based approach, but without any feature information. \cite{negahban+12,wauthier+13,busaHu14,rajkumarAg14,shahWa15} \cite{chenJo16}, \cite{rajkumarAg16}, \cite{shah+16} work in the pairwise ranking setting under different probabilistic models (including BTL model), but again none of them use features explicitly and hence are sub-optimal for our setting (as we will see in the experiments).
\cite{jamiesonNo11} work in a setting where the probabilities come from some unknown low-dimensional feature embedding of the items. However, they require the pairs to be queried actively, whereas our work focuses on random (passive) selection of pairs.  There is also a rich ranking literature on noisy sorting \cite{bravermanMo08}, approximation algorithms \cite{ailon08}, dueling bandits \cite{yue12} etc., which are fundamentally different from the passive setting under the BTL model considered here.
Table \ref{tab:sum_con} summarizes the sample complexities of a few related works.


\vspace*{-4pt}
%\iffalse
\begin{table}[h]
\vspace*{-4pt}
%\hspace*{-50pt}
\begin{center}
\scalebox{0.6}{
\begin{tabular}{|c|c|c|}
\hline
\textbf{Ranking} & \textbf{Sampling}  & \textbf{Sample} \\
\textbf{Model} & \textbf{Technique}  & \textbf{Complexity} \\
\hline
 Noisy permutation \cite{bravermanMo08} & Active  & $O(n \log n)$ \\
\hline
 Low $d$-dimensional embedding \cite{jamiesonNo11} & Active  & $O(d \log^2 n)$ \\
\hline
 Deterministic tournament \cite{ailon08} & Active  & $O(n \text{poly}(\log n))$\\
\hline
 Rank-$r$ preference with $\nu$ incoherence \cite{gleichLek11} & Passive  & $O(n\nu r(\log n)^2)$ \\
\hline
 Bradley Terry Luce (BTL) \cite{negahban+12} & Passive  & $O(n \log n)$\\
\hline
 Noisy permutation \cite{wauthier+13} & Passive & $O(n \log n)$\\
\hline
 Low $r$-rank pairwise preference \cite{rajkumarAg16} & Passive & $O(nr \log n)$\\
\hline
 Low $d$-rank feature with BTL \cite{niranjanRa17} & Passive & $O(d^2 \log n)$\\
\hline
Rank aggregation balancing features \cite{chiang+17} & Passive & $O(n)$\\
\hline
\textbf{f-BTL} ($\alpha$ `independent items') [This work] & Passive  & $O(\alpha \log \alpha)$\\
\hline
\end{tabular}}
\vspace*{-10pt}
    \caption{State-of-the-art vs Our work}
\label{tab:sum_con}
\end{center}
\end{table}
\vspace*{-10pt}
\fi%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Supplementary for Section \ref{sec:no_noise}}
\label{app:no_noise}

\subsection{Proof of Theorem \ref{thm:halls-equations}}

\hallseqs*

\begin{proof}
If there is a matching that covers $A$, then each node $i$ in $\cI(G)$ has a distinct representative edge in $M$ which induces an equation containing $i$. Thus there are at least $\cI(G)$ equations with each node appearing in at least one of them and hence the system can be solved for. More over the solution would be unique since these $\cI(G)$ many induced equations would be linearly independent. It is important note in this regard that than all the equations (of form Eqn. \eqref{eqn:linear}) emerges from any pair $(i,j)$ is would lead to a \emph{linearly independent} equation--this is since we also assume $\bB$ is such that any $\alpha \times \alpha$ submatrix of $\bB$ is of rank $\alpha$ (see Sec. \ref{sec:prb_set}), which ensures none of the dependent features can be represented as a linear combination of the other dependent features. Above is crucial for the correctness of proof as it ensures all the linear equations induced through these cover-matching edges are linearly independent. 

On the other hand, if there is no matching that covers $A$, then by Hall's marriage theorem \cite{hall1935}, there must exist some subset $S \subseteq A$ such that it's neighbours $|N_{C_M}(S)| < |S|$. As the total number of equations that involve nodes in $S$ are less than the number of nodes, this set of equations cannot be solved for. %or precisely $\cI(G)$ is maximal independent set of the independent nodes and all the dependent items $[n]\setminus \cI(G)$, can only be represented as a unique linear combination of the independent nodes $\cI(G)$. 
\end{proof}

\subsection{Proof of Theorem \ref{thm:sampcomp}}
\label{app:nonoise_thm}

\sampcomp*

\begin{proof}

Note from Theorem \ref{thm:halls-equations} we have that one only fails to recover the true $\btheta$ if and only if the edge set $\Delta_{M}$ of the bipartite graph $C_M$ 
fails to cover $A$.
%every $(\alpha-1)$ subsets of $A$, which in turn implies that $\Delta_{M}$ must fail cover $A$ as well. 
Thus we have
\begin{align*}
\bP(\btheta \neq \bhtheta) & = \bP(\{A \text{ is not covered by } C_M\})\\
& = \bP(\{\exists S' \subseteq A \mbox{ s.t. } |N_{C_M}(S')| < |S'|\}) ~~~(\text{by Hall's Marriage Theorem})
\end{align*}

We use $N_G(i)$ to denote the set of neighbours of node $i \in [n]$ in a graph $G$ and $\bar N_G(i)$ to denote the set of neighbours of node $i \in [n]$ in $G$ including $i$ itself, i.e. $\bar N_G(i) = N_G(i)\cup \{i\}$. Define $N_G(S) = \cup_{i \in S}N_G(i), ~\forall S \subseteq V(G)$ and $\bar N_G(ij) = \left(\bar N_G(i) \cup \bar N_G(j)\right) \cap \mathcal{I}(G)$. Thus we can associate every node $k \in \mathcal{I}(G) = [\alpha(G)]$ in the independent set to a set of edges $M_k$ such that $(i,j) \in M_k \iff  k \in \bar N_G(ij)$. 
Let us also denote $n_k = |M_k|$ and let $n_{\min} = \displaystyle \min_{\{k \in [\alpha(G)]\}} n_k$. More generally we denote $n_{I} = |\cap_{i \in I}M_i|, ~\forall I \subseteq [\alpha(G)]$.

We will also find it convenient to define $c_{I} = |\cup_{i \in I}M_i|$ and $d_{I} = |\cap_{i \in I}M_i|$, $\forall I \subseteq [\alpha(G)]$. Clearly when $|I| = 1$, say $I = \{i\}, ~i \in [n]$, $c_I = d_I = n_i$. In general, for $|I|=q, ~1 \le q \le \alpha(G)$ we have $c_{I} = \sum_{x = 1}^{q}\sum_{J \subseteq I \mid |J|=x}(-1)^{x-1}d_J$, where the size of the intersecting sets $d_I$s depends on specific the structure of the graph $G$ (see Theorem \ref{thm:sampcomp_eg} for graph specific analysis). 

Now if we denote the event $F_i := \{\exists S' \subseteq A \mbox{ s.t. } |S'| = i \text{ and } S'\mbox{ is not covered by } C_M\}$, $\forall i \in [\alpha(G)]$, and recalling $A = [\alpha(G)]$, we further get

\begin{align}
\label{eq:sampcom_prf1}
\nonumber \bP(\btheta \neq \bhtheta) & = \bP(\{ \exists S' \subseteq A \mbox{ s.t.} ~|N_{C_M}(S')| < |S'|\}) \\
\nonumber & = P(F_1 \cup F_2 \cup F_3 \ldots F_{\alpha(G)})\\
\nonumber & = P\big(F_1 \cup (F_2 \cap F_1^c ) \cup (F_3 \cap F_2^c ) \cup \ldots \cup (F_{\alpha(G)} \cap F_{\alpha(G)-1}^c) \big)\\ 
& = P(F_1) + P(F_2 \cap F_1^c) + \ldots + P(F_{\alpha(G)} \cap F_{\alpha(G)-1}^c)
\end{align}

Assuming the pairwise node preferences are drawn according to the edges sampled from an Erd\H{o}s-R\'enyi random graph $\mathcal G(n,p)$ and applying Theorem \ref{thm:halls-equations} on the event $F_1$, it is easy to see that 
\begin{align*}
& \bP(F_1) = \bP(\{\exists S' \subseteq A \mbox{ s.t. } |N_{C_M}(S')| < |S'| = 1\})\\ 
& = \bP\Big(\{\exists S' = \{k\}, ~k \in [\alpha(G)] \mbox{ s.t. no edge from } M_k \mbox{ is sampled in } \mathcal G(n,p)\}\Big) \le \sum_{i = 1}^{\alpha(G)}(1-p)^{n_i},
\end{align*}
where the last inequality follows taking union bound over all singletons in $A = [\alpha(G)]$. Note that one can further bound above as $\bP(F_1) \le \alpha(G)\exp(-pn_{\min})$. 
In general, for any $1\le q \le \alpha(G)$, one can similarly derive
\begin{align}
\label{eq:sampcom_prf2}
\nonumber & \bP(F_q \cap F_{q-1}^c) \\
\nonumber & = \bP\big(\{\exists S' \subseteq A, |S'| = q, ~S' \text{ is not covered by } C_M \mbox{ and } \forall S'_1 \subset A, |S_1'| < q, ~S_1' \text{ is covered by } C_M\}) \\
& \le \sum_{I \subseteq \cI(G) \mid |I|=q}\binom{d_I}{q-1}p^{q-1}(1-p)^{c_I-q},
\end{align}

where the last inequality follows from the crucial observation that for any $S' \subseteq A$, $|S'| = q$ if $S'$ is not covered by $C_M$ but all it subsets $S_1' \subset S'$ are, then $\mathcal{G}(n,p)$ must have sampled exactly $q-1$ edges from $\cap_{i \in S'}M_i$ and none from $\big( \cup_{i \in I}M_i \setminus \cap_{i \in I}M_i \big)$.
Using \eqref{eq:sampcom_prf2} in \eqref{eq:sampcom_prf1} we finally get,

\begin{align*}
\bP(\btheta \neq \bhtheta) & \le P(F_1) + P(F_2 \cap F_1^c) + \ldots + P(F_{\alpha(G)} \cap F_{\alpha(G)-1}^c) \\
& = \sum_{q = 1}^{\alpha(G)}\sum_{I \subseteq \cI(G) \mid |I|=q}\binom{d_I}{q-1}p^{q-1}(1-p)^{c_I-(q-1)},
\end{align*}

where we assume ${x \choose y} = 0$, if $x < y$. Further note that if $d_{\max}(G) < \alpha(G)$, then for any $I \subseteq [\alpha(G)]$ such that $|I| > (d_{\max}+1)$, we have $d_{I} = 0$, using which we further get

\begin{align*}
\bP(\btheta \neq \bhtheta) \le \sum_{q = 1}^{\min\{\alpha(G),~(d_{\max}(G)+1)\}}\sum_{I \subseteq \cI(G) \mid |I|=q}\binom{d_I}{q-1}p^{q-1}(1-p)^{c_I-(q-1)}
\end{align*} 
Thus the claim follows.
\end{proof}

\subsection{Proof of Theorem \ref{thm:sampcomp_eg}}
\label{app:egs}
\sampcompeg*

\begin{proof}
We will now analyse Theorem \ref{thm:sampcomp} for certain specific class of graphs. We will be using the same notations used in proof of Theorem \ref{thm:sampcomp} for the purpose.
\begin{enumerate}
\item {\bf Fully Disconnected Graph:} Note that in this case $\alpha(G) = n$. Also note that $\forall k \in [n], ~M_k = \{(k,i) \mid i \in [n]\setminus\{k\}\}$. Thus $n_k = n-1$. Moreover $\forall I \subseteq [n]$, $|I| = 2$, $c_I = 2n-3$, $d_I = 1$, and if $|I| \ge 3$, $d_I = 0$. 

Now applying Theorem \ref{thm:sampcomp} and noting $d_{\max}(G) = 0$, we further get that,

\begin{align*}
\bP(\btheta \neq \bhtheta)
& \le \sum_{q = 1}^{\min\{\alpha(G),~(d_{\max}(G)+1)\}}\sum_{I \mid |I|=q}\binom{d_I}{q-1}p^{q-1}(1-p)^{c_I-q}\\
& = \sum_{i = 1}^{n}(1-p)^{n-1} + \sum_{i < j}p(1-p)^{2n-3-1}\\
& = n(1-p)^{n-1} + \binom{n}{2}p(1-p)^{2n-4}\\
& \le n(e^{-p})^{n-1} + \frac{n(n-1)}{2} p(e^{-p})^{2n-4}\\
& \le n^2(e^{-p(n-1)})\\
& \le \delta,
\end{align*}

solving which we get $p \ge \frac{1}{(n-1)}\log \Big( \frac{n^2}{\delta} \Big)$. Thus the expected number of edges (pairwise preferences) in the random graph required is atleast $p\binom{n}{2} \ge \frac{n}{2} \log \Big( \frac{n}{\delta}\Big)$, which recovers the result for the usual BTL model.% where the expected number of required edges (pairwise preferences) in the random graph is $\frac{n}{2}\log\left(\frac{n}{\delta}\right)$

\item {\bf Complete Graph:} In this case $\alpha(G) = 1$. Without loss of generality assuming $\cI(G) = \{1\}$, thus we have $M_1 = \{(i,j) \mid i,j \in [n]\}$. Thus $n_1 = \binom{n}{2}$. Moreover $\forall I \subseteq [n]$, $|I| \ge 2$, $d_I =  0$. 

Applying Theorem \ref{thm:sampcomp} as before and noting $d_{\max}(G) = n$, we further get,

\begin{align*}
\bP(\btheta \neq \bhtheta)
& \le \sum_{q = 1}^{\min\{\alpha(G),~(d_{\max}(G)+1)\}}\sum_{I \mid |I|=q}\binom{d_I}{q-1}p^{q-1}(1-p)^{c_I-q}\\
& = (1-p)^{\binom{n}{2}}\\
& = (e^{-p})^{\binom{n}{2}}\\
& \le \delta,
\end{align*}

solving which one gets $p \ge \frac{1}{\binom{n}{2}}\log \Big( \frac{1}{\delta} \Big)$. Thus the expected number of edges (pairwise preferences) in the random graph required is atleast $p\binom{n}{2} \ge \log \Big( \frac{1}{\delta}\Big)$, which is intuitive as well since in a complete graph one needs the knowledge of only $\Omega(1)$ pairwise preferences to recover the exact ranking (i.e. $\btheta$) with high probability $(1-\delta)$.

\item {\bf $r$-Disconnected Cliques:} Say $G$ has exactly $r \in [n]$ disconnected cliques, $G_1, G_2, \ldots G_r$, each with $d \in [n]$ edges (i.e. for each $k \in [r], ~|E(G_k)| = d$), assuming $n = rd$. Thus in this case $\alpha(G) = r$. Without loss of generality assume $\cI(G) = \{1,2, \ldots r\}$. Then $\forall k \in [r]$, we have $M_k = \{(i,j) \mid (i,j) \in E(G_k)\} \cup \{(k,j) \mid j \in [n]\setminus \{k\}\}$. Thus $n_k = \binom{d}{2} + (r-1)$. Moreover $\forall I \subseteq [n]$, $|I| = 2$, $c_I = 2(\binom{d}{2} + (r-1)) - 1 = d(d-1) + (r-2)$, $d_I =  1$ and $|I| \ge 3$, $d_I = 0$. 

Then applying Theorem \ref{thm:sampcomp} as above and noting $d_{\max}(G) \le \lceil \frac{n}{r} \rceil$, we further get,

\begin{align*}
\bP(\btheta \neq \bhtheta) & \le \sum_{q = 1}^{\min\{\alpha(G),~(d_{\max}(G)+1)\}}\sum_{I \mid |I|=q}\binom{d_I}{q-1}p^{q-1}(1-p)^{c_I-q}\\
& = \sum_{i = 1}^{r}(1-p)^{\binom{d}{2}+r-1} + \sum_{i<j, i,j \in [r]}p(1-p)^{d(d-1)+(r-2)-1}\\
& = r(1-p)^{\binom{d}{2}+r-1} + \binom{r}{2}p(1-p)^{d(d-1)+(r-3)}\\
& \le r(e^{-p})^{\binom{d}{2}+r-1} + \frac{r(r-1)}{2} p(e^{-p})^{d(d-1)+(r-3)}\\
& \le r(r-1)(e^{-p(\binom{d}{2}+r-1)})\\
& \le r^2(e^{-p(\binom{d}{2}+r-1)}) \le \delta,
\end{align*}
solving which one can derive $p \ge \frac{1}{\binom{d}{2} + (r-1)}\log \Big( \frac{r^2}{\delta} \Big)$. Thus the expected number of edges (pairwise preferences) in the random graph required is atleast $p\binom{n}{2} = \frac{n(n-1)/2}{d(d-1)/2 + r-1}\log \Big( \frac{r^2}{\delta}\Big) \ge \frac{n(n-1)r^2}{n(n-r)+2r^2(r-1)} \log \Big( \frac{r^2}{\delta}\Big)\ge r \log \Big( \frac{r^2}{\delta} \Big)$, where the last inequality follows assuming $r < \frac{n}{\sqrt{2}}$. Note that setting $d = 1$ and $d = n$, one can recover the earlier bounds we derived for disconnected and complete graphs  respectively.

\item {\bf Star:} Note that in this case the size of the maximal independent set $\alpha(G) = (n-1)$. Without loss of generality assume $\cI(G) = [n]\setminus\{1\}$. Thus we have that for any $k \in \cI(G)$, $E_k = \{(k,j) \mid j \in [n]\setminus\{k\}\} \cup \{(1,j) \mid j \in [n]\setminus\{1\}\}$. Thus $n_k = (n-1)+(n-2) = 2n-3$. Moreover $\forall I \subseteq [n]$, $|I| = 2$, $d_I =  (n-2)+1 = n-1$ and $c_I = 2(2n-3)-(n-1) = 3n-5$. For $|I| \ge 3$, $d_I = n-2$ and $c_I = (2n-3)|I| - (n-1)(\binom{|I|}{2}) + (n-2)(\binom{|I|}{3}) - \ldots + (-1)^{|I|-1}(n-2)$, e.g. when $|I| = 3, ~c_{I} = 4n-14$ etc.

Applying Theorem \ref{thm:sampcomp} as before and noting $d_{\max}(G) = (n-1)$, we further get,

\begin{align*}
\bP(\btheta \neq \bhtheta)
& \le \sum_{q = 1}^{\min\{\alpha(G),~(d_{\max}(G)+1)\}}\sum_{I \mid |I|=q}\binom{d_I}{q-1}p^{q-1}(1-p)^{c_I-q}\\
& = (n-1)\Big((1-p)^{2n-3} + \frac{n(n-2)}{2}p(1-p)^{2n-4}\Big)\\
& + \binom{n-1}{3}\binom{n-2}{2}p^2(1-p)^{3n-12} + \ldots \\
& \le n^2(e^{-p(n-1)})\\
& \le \delta.
\end{align*}

Similar to the case of {\it fully disconnected graph}, solving $p$ from above one can get that the expected number of edges (pairwise preferences) in the random graph required is atleast $p\binom{n}{2} = \Big(\frac{n}{2}\log \Big( \frac{n}{\delta}\Big)\Big)$.

\item {\bf Cycle:} We will assume that $n = 2n' \ge 4$ is even, similar analysis can be done for the odd number of nodes as well. Thus in this case $\alpha(G) = n'$. Without loss of generality assume $\cI(G) = \{2i \in [n] \mid i \in [n] \}$. Thus we have that for any $k \in \cI(G)$, $E_k = \{(k,j) \mid j \in [n]\setminus\{k\}\} \cup \{(k-1,j) \mid j \in [n]\setminus\{k-1\}\} \cup \{((k+1)\mod k,j) \mid j \in [n]\setminus\{(k+1)\mod k\}\}$. Thus $n_k = (n-1)+(n-2)+(n-3) = 3(n-2)$. Moreover $\forall I \subseteq [n]$, $|I| = 2$, $d_I =  (n-2)+1 = n-1$ and $c_I = 2(3n-6)-(n-1) = 2n-5$. For $|I| \ge 3$, $d_I = 0$.

Further applying Theorem \eqref{thm:sampcomp} and noting $d_{\max}(G) = 2$, we further get,

\begin{align*}
\bP(\btheta \neq \bhtheta)
& \le \sum_{q = 1}^{\min\{\alpha(G),~(d_{\max}(G)+1)\}}\sum_{I \mid |I|=q}\binom{d_I}{q-1}p^{q-1}(1-p)^{c_I-q}\\
& = \sum_{i = 1}^{n'}(1-p)^{3n-6} + \sum_{I \subset \cI(G), |I|=2}(n-1)p(1-p)^{2n-5 -(n-1)}\\
& = \frac{n}{2}(1-p)^{3n-6} + \frac{n(n-1)(n-2)}{8}p(1-p)^{n-4}\\
& \le \frac{n}{2}(1-p)^{3n-6} + \frac{n-2}{4}(1-p)^{n-4} ~~\big(\text{ as } p\binom{n}{2} \ge 1\big)\\
& \le n(1-p)^{n-4}\\
& \le \delta,
\end{align*}
solving which one can derive $p = f(\delta) \ge \frac{1}{n-4}\log \Big( \frac{n}{\delta} \Big)$. Thus the expected number of edges (pairwise preferences) in the random graph required is atleast $p\binom{n}{2} = \frac{n(n-1)/2}{n-4}\log \Big( \frac{n}{\delta}\Big) \ge \frac{n}{2}\log \Big( \frac{n}{\delta}\Big)$.

\item {\bf $K$-ary Tree:} Let $h$ be the height of the tree and $1$ denotes the root node. For any node $i \in [n]$, $par(i)$ and $ch(i)$ respectively denotes the parent and child nodes $i$. We will consider only trees of even height for the purpose, it is easy to derive a similar analysis for trees of odd height. Note that $n = (1 + K + K^2 + \ldots + K^h) = \frac{K^{h+1}-1}{K-1}$. Clearly the maximum independent set contains all the nodes which which are at a even length distance from the root, including the root itself. Thus $\alpha(G) = (1 + K^2 + K^4 + \ldots + K^h) = \frac{K^{h+2-1}}{K^2-1}$. 

Note that for any $k \in \cI(G)$, $N_G(k)\cap \cI(G)^c = \{par(k)\cup ch(k)\}$. Also every node in $[n] \setminus \{\cI(G)\}$ form $C = (K + K^3 + K^5 + \ldots + K^{h-1}) = \frac{K^{h+2}-1}{K^2-1}$ clusters, we denote them by $H_1, H_2, \ldots H_C$, such that for any $i \in [n]\setminus\{\cI(G)\}$, $H_i = \{ j \in \cI(G) \mid j \in par(i)\cup ch(i) \}$. Thus $|H_i| = K+1$. We will also abbreviate $N_G(\cdot)$ as $N(\cdot)$ for ease of notations.

Thus for any $k \in \cI(G)$, $E_k = \{(k,j)\mid j \in [n]\setminus\{k\}\} \cup_{k' \in par(k)\cup ch(k)}\{(k',j)\mid j \in [n]\setminus\{k'\}\}$. This gives that $n_k = (n-1) + \sum_{i = 2}^{K+2}(n-i) = (k+2)\frac{2n-k-3}{2}$. Moreover, for any $I \subseteq \cI(G), |I| = 2$, 

\[
d_I =
\begin{cases}
1+(n-2), ~\forall i,j \in I, |N(i)\cap N(j)| = 1,\\
1, ~otherwise,\\
\end{cases}
\]

\[
c_I =
\begin{cases}
(K+1)(2n-K-2), ~\forall i,j \in I, |N(i)\cap N(j)| = 1,\\
(K+2)(2n-k-3) - 1, ~otherwise,\\
\end{cases}
\]

for any $I \subseteq \cI(G), 3 \ge |I| \le k+1$,

\[
d_I =
\begin{cases}
(n-2), ~\forall i,j \in I, |N(i)\cap N(j)| = 1,\\
0, ~otherwise,\\
\end{cases}
\]

\[
c_I =
\begin{cases}
|I|(K+2)\frac{(2n-K-3)}{2} - \binom{|I|}{2}(n-1) \ldots (-1)^{|I|-1}(n-2), \\~~~~~~\forall i,j \in I, |N(i)\cap N(j)| = 1,\\
|I|(K+2)\frac{(2n-K-3)}{2} - \binom{|I|}{2}1, ~otherwise,\\
\end{cases}
\]

and for any $I \subseteq \cI(G), |I| > K+1$, $d_I = 0$. Now applying Theorem \ref{thm:sampcomp} as before and noting $d_{\max}(G) = K$ we further get,

\begin{align*}
\bP(\btheta \neq \bhtheta)
& \le \sum_{q = 1}^{\min\{\alpha(G),~(d_{\max}(G)+1)\}}\sum_{I \mid |I|=q}\binom{d_I}{q-1}p^{q-1}(1-p)^{c_I-q}\\
& = \alpha(G)(1-p)^{\frac{(K+2)(2n-K-3)}{2}} + \\
& \qquad C\binom{K+1}{2}(n-1)p(1-p)^{(K+1)(2n-K-2)-(n-1)}\\
& \qquad (\binom{n}{2} - C\binom{K+1}{2})p(1-p)^{(K+2)(2n-K-3)-1}\\
& \qquad \sum_{K' = 3}^{K+1}C\binom{K+1}{3}(n-2)p^2(1-p)^{``C_I"-2}\\
& \le \delta.
\end{align*}
Unlike the previous cases this does not reduce to any non-trivial closed form upper bound of $p$ for deriving a generalized sample complexity bound for any $K$-ary tree, however one might use above to get sample complexities for some specific choices of $h$ and $K$.
\end{enumerate}

\end{proof}

\section{Supplementary for Section \ref{sec:noise}}
\label{app:noise}

\subsection{Proof of Theorem \ref{thm:fbtlls}}
\label{app:noise_thm}

\fbtlls*

\begin{proof}
Let us denote the {\it reduced Laplacian} matrix by $\btL = \tbQ\tbQ^T$. 
$\bL = \bQ\bQ^T$ being the original graph Laplacian, the {\it reduced Laplacian} is given by $\btL = \bB^{T}\bQ\bQ^T\bB = \bB^{T}\bL\bB$ which is clearly positive semi-definite and has all non-negative eigenvalues. Define $f(\bx) = \|\tilde{\bQ}^{T}\bx - \hat{\by} \|^2$. Note that $\bhv = \arg \min_{\bx \in \bR^{\alpha}}f(\bx)$ in Algorithm \ref{alg:fbtlls} would satisfy the optimality condition $\nabla f(\bhv) = 0$ when
\begin{align}
\label{eq:prf_ls_1}
\tbQ \bhy = \tbQ\tbQ^T \bhv = \btL\bhv,
\end{align}
On the other hand, assuming $\bv \in \bR^{\alpha}$ to be such that $v_i = \theta_i, ~\forall i \in [\alpha]$ and $\by \in \bR^{m}$ be such that $y_{ij} = \log \bigg( \frac{P_{ij}}{P_{ji}} \bigg)$, we have $\bv = \arg \min_{\bx \in \bR^{\alpha}}\| \tbQ^T\bx - \by \|^2$ which gives
\begin{align}
\label{eq:prf_ls_2}
\tbQ \by = \btL \bv.
\end{align}
The above optimality condition holds as for any $i,j \in [n]$, $y_{ij} = \theta_i - \theta_j$, and so $\by = \bL^T\btheta = \bL^T\bB \bv = \tbQ^T \bv$, where the second equality holds due to \eqref{eqn:scores}. Thus combining \eqref{eq:prf_ls_1} and \eqref{eq:prf_ls_2}, we get
\[
\tbQ (\by - \bhy) = \btL (\bv - \bhv)
\]
which further gives,
\[
(\by - \bhy)^T\tbQ^T\tbQ (\by - \bhy) = \|\tbQ (\by - \bhy)\|^2 = \|\btL (\bv - \bhv)\|^2 = (\bv - \bhv)^T\btL\btL^T(\bv - \bhv),
\]
from which we get
\begin{align}
\label{eq:prf_ls_3}
 \lambda_{\min}(\btL\btL^T) \|\bv - \bhv\|^2 \le \|\btL (\bv - \bhv)\|^2 = \|\tbQ (\by - \bhy)\|^2 
 \le \lambda_{\max}(\tbQ^T\tbQ) \|\by - \bhy\|^2
\end{align}
where $\lambda_{\min}(\btL\btL^T)$ is the smallest non-zero eigenvalue of the positive semi-definite matrix $(\btL\btL^T)$ and $\lambda_{\max}(\tbQ^T\tbQ)$ being the largest eigenvalue of $(\tbQ^T\tbQ)$.
Now from standard results on matrix eigenvalues, we know that the set of non-zero eigenvalues of $\tbQ^T\tbQ$ and $\tbQ\tbQ^T$ are exactly same, which implies $\lambda_{\max}(\tbQ^T\tbQ) = \lambda_{\max}(\tbQ\tbQ^T) = \lambda_n$.
Moreover, $\lambda_{\min}(\btL\btL^T) = (\lambda_{\min}(\btL))^2 = (\lambda_{\min}\tbQ\tbQ^T)^2 = \lambda^2_1$. Thus from Equation \ref{eq:prf_ls_3}, we get
\begin{align}
\label{eq:prf_ls_4}
\|\bv - \bhv\| \le \frac{\|\by - \bhy\|\sqrt{\lambda_n}}{\lambda_1}.
\end{align}

Now in order to bound $\|\by - \bhy\| = \sqrt{\sum_{(i,j) \in E}(y_{ij} - \hat y_{ij})^2}$, first recall from the definition of $y_{ij}$ that $y_{ij} = \log \Big( \frac{P_{ij}}{P_{ji}} \Big) = \log P_{ij} - \log P_{ij}$, for any edge $(i,j) \in M$. Similarly we have $\hat y_{ij} = \log \hat P_{ij} - \log \hat P_{ij}$. Thus we have,
\begin{align}
\label{eq:prf_ls_5}
\nonumber |y_{ij} - \hat y_{ij}| & = |(\log P_{ij}-\log \hat P_{ij}) - (\log P_{ji}-\log \hat P_{ji})| \\
& \le |(\log P_{ij}-\log \hat P_{ij})| + |(\log P_{ji}-\log \hat P_{ji})|
\end{align}

Let us denote $\nu_{ij} = |P_{ij}-\hat P_{ij}|$. Clearly $|P_{ji}-\hat P_{ji}| = \nu_{ij}$ since $P_{ij}+P_{ji} = \hat P_{ij}+\hat P_{ji} = 1$. Note that the random variable $\hat P_{ij}$ is the average of $K$ samples from Bernoulli$(P_{ij})$, applying {\it Hoeffding's Inequality}  we get
\begin{align}
\label{eq:prf_ls_6}
\bP\Big( \nu_{ij} \ge \eta  \Big) = \bP\Big( |P_{ij}-\hat P_{ij}| \ge \eta  \Big) \le 2e^{-2\eta^2K}
\end{align}
Now since $|\theta_i| \le b, ~\forall i \in [n]$, we have $\frac{1}{1+e^{2b}} \le P_{ij} \le \frac{e^{2b}}{1+e^{2b}}, ~\forall i,j \in [n]$. Also as $K \ge 6(1+e^{2b})^2\log n$,  using \eqref{eq:prf_ls_6} we further have
\begin{align}
\label{eq:prf_ls_6b}
\bP\Big( \nu_{ij} \ge \frac{P_{ij}}{2} \Big) \le \bP\Big( \nu_{ij} \ge \frac{1}{2(1+e^{2b})} \Big) \le \frac{2}{n^3}, ~\forall i,j \in [n]
\end{align}
Above thus implies that $\nu_{ij} = |P_{ij} - \hat P_{ij}| < \frac{P_{ij}}{2}$ with high probability of at least $(1-\frac{2}{n^3})$, for $K = 6\log n (1+e^{2b})^2$. Further since $\nu_{ij} = \nu_{ji}$, using union bound over all pairs in $M$, we get that \eqref{eq:prf_ls_6b} holds true for all pairs $(i,j) \in [n]$ with probability atleast $\big( 1-\frac{2m}{n^3} \big)$, i.e.
\[
\bP\bigg(\forall i,j \in [n], \nu_{ij} < \frac{P_{ij}}{2} \bigg) > \bigg( 1-\frac{2m}{n^3} \bigg).
\]
Define $g: [0,1] \mapsto \bR$, such that $g(p) = \log(p), ~\forall p \in[0,1]$. Using Taylor's theorem, one can obtain a $p^* \in [P_{ij} - \nu_{ij},P_{ij} + \nu_{ij}]$ such that
\begin{align*}
& \log \hat P_{ij} = \log P_{ij} + \frac{1}{p^*}(\hat P_{ij} -  P_{ij}), \text{ or equivalently,}\\
& \frac{\log(\hat P_{ij}) - \log P_{ij}}{(\hat P_{ij} -  P_{ij})}  = \frac{1}{p^*} \le \frac{2}{P_{ij}},
\end{align*}
where the last  inequality follows from \eqref{eq:prf_ls_6b} with  probability at least $(1-\frac{2m}{n^3})$.

Furthermore, in the high probability event, as $|\hat{P}_{ij} - P_{ij}| < \frac{P_{ij}}{2}$
Thus we have 
\[
|\log(\hat P_{ij}) - \log P_{ij}| \le 1, ~~\forall i,j \in [n].
\]
combining above with \eqref{eq:prf_ls_5} we get
\[
|y_{ij} - \hat y_{ij}| \le 2,
\]
which implies $\|\by - \bhy \| \le 2\sqrt{m} $. 
%where recall that $m = |E|$ is the total number of edges sampled.
%Again applying \eqref{eq:prf_ls_6} for any given arbitrary $\eta \le \frac{1}{2(1+e^{2b})^2}$, $K \ge 6(1+e^{2b})^2\log n$ and applying union bound over all $m$ pair of sampled edges $(i,j) \in M$, we get that
%\[
%\bP\big(\forall (i,j) \in M, ~\nu_{ij} \ge \eta\big) \le %2me^{-2\eta^2K}.
%\]
Applying above to \eqref{eq:prf_ls_4} we thus get 
\begin{align}
\label{eq:prf_ls_7}
\|\bv - \bhv\| \le \frac{\|\by - \bhy\|\sqrt{\lambda_n}}{\lambda_1} \le \frac{2\sqrt{m\lambda_n}}{\lambda_1}
\end{align}
with probability at least $\big( 1 - \frac{1}{n} \big)$. Finally note that since $|\theta_i| \ge a, ~\forall i \in [\alpha]$, we have $\|\bv\| \ge a\sqrt{\alpha}$. Moreover, as $\btheta = \bB\bv$, $\|\btheta\| = \|\bB\bv\| \ge \sqrt{\lambda_{\min}(\bB^{T}\bB)}\|\bv\| \ge a\sqrt{\alpha\lambda_{\min}(\bB^{T}\bB)}$.
On the other hand, we have set $\bhtheta = \bB\bhv$ thus,
\[
\|\btheta - \bhtheta\| = \|\bB(\bv - \bhv)\| \le \sqrt{\lambda_{\max}(\bB^{T}\bB)}\|\bv - \bhv)\|. 
\]
Combining above with \eqref{eq:prf_ls_7}, we finally have 
\begin{align*}
\frac{\|\btheta - \bhtheta\|}{\|\btheta\|} \le \frac{2\sqrt{m\lambda_n\lambda_{\max}(\bB^{T}\bB)}}{a\lambda_1\sqrt{\alpha \lambda_{\min}(\bB^{T}\bB)}},
\end{align*}
with probability at least $\big( 1 - \frac{2m}{n^3} \big)$ and the claim follows. 
\end{proof}

\section{Supplementary for Section \ref{sec:lb}}


\subsection{Proof of Theorem \ref{thm:lb}}
\label{app:lb}

\lb*

\begin{proof}
We solve the above problem reducing it to a multi-class hypothesis testing problem as follows: Consider we are given a set of $N$ score vectors $\{\btheta^1, \btheta^2, \ldots \btheta^{N}\} \subset \Theta_B(a,b)$ such that $\|\btheta^{k_1} - \btheta^{k_2}\| \ge \delta$, for any two score vectors $\btheta^{k_1},\btheta^{k_2}$ such that $k_1,k_2 \in [N]$. Then given the set of pairwise preferences generated by an unknown sore vector $\btheta = \btheta^{L}$, where $L$ is a random index selected uniformly from the set $[N]$, the hypothesis testing task is to identify the index of the true score vector $L$.

Now given any algorithm that predicts a score vector $\bhtheta$ based on the given set of pairwise preferences from the f-BTL model $\btheta^{L}$, sampled according to a $\mathcal G(n,p)$ Erd\H{o}s-R\'enyi random graph with $p=\frac{\zeta}{n}$ for some $\zeta > 0$, such that $K$ independent noisy pairwise preferences are available for each sampled pair, one natural way to estimate $L$ is by $\hat L = \arg \min_{k \in [N]}\| \bhtheta - \btheta^k \|$. Note that for $\hat L$ to be different that $L$, it has to be the case that $\| \bhtheta - \btheta \| \ge \frac{\delta}{2}$. Thus one can write

\[
\bE[\| \bhtheta - \btheta \|] \ge \frac{\delta}{2}\bP(\hat L \neq L)
\]

Further applying a similar information theoretic analysis as \cite{negahban+12}, one gets

\begin{align}
\label{eq:lb1}
\bE[\| \bhtheta - \btheta \|] \ge \frac{\delta}{2}\bigg[ 1 - \frac{\frac{K\zeta}{2N^2}\sum_{k_1 \in [N]}\sum_{k_2 \in [N]}\|e^{\btheta^{k_1}} - e^{\btheta^{k_2}}\|^2 + \log 2}{\log N} \bigg]
\end{align}

Thus the remaining task is to construct a set of $N$ score vectors $\{\btheta^1, \btheta^2, \ldots \btheta^{N}\} \subset \Theta_B(a,b)$ which are well separated, so to get suitable bounds on the terms $\|e^{\btheta^{k_1}} - e^{\btheta^{k_2}}\|^2, ~\forall k_1,k_2 \in [N]$ in \eqref{eq:lb1}. We use the following construction for the purpose:

{\bf Constructing the set of score vectors.} 
For any $k \in [N]$, we construct the $k^{th}$ score vector $\theta^k$ set of the set of $N$ random score vectors as follows:

\begin{itemize}
\item Draw $\alpha$ many random variables $X_1^k, X_2^k, \ldots X_\alpha^k \sim Unif\Big[\Big(\frac{1}{2} - {\beta\delta}\Big),\Big(\frac{1}{2} + {\beta\delta}\Big)\Big]$, where $\beta$ is a constant to be adjusted later. 

\item Set $\theta^k_{i} = a + (b-a)X^k_{i} , ~\forall i \in [\alpha]$, $0 < a < b < 1$.

\item Consider the coefficient matrix $\bB \in \bR_{+}^{n \times \alpha}$ such that $\sum_{j = 1}^{\alpha}B_{ij} =1 , ~\forall i \in [n]$.

\item Set the remaining score vectors $\theta^k_{i}$ according to \eqref{eqn:scores} for all $i \in [n]\setminus [\alpha]$.
\end{itemize}

%where $S = \sum_{i=1}^{n}\sum_{j = 1}^{\alpha}B_{ij}$ and $s_j = \sum_{i=1}^{n}B_{ij}$.

We denote the restriction of the score vector $\btheta^{k}$ to the independent set $\cI(G)$ by $\btheta^{k}_{[\alpha]} \in \bR^\alpha$, where w.l.o.g. we assume $\cI(G) = [\alpha]$ as before. 
Furthermore, from \eqref{eqn:scores} for any two $k_1,k_2 \in [N]$, we have 
\begin{align}
\label{eq:lb0}
\lambda_{\min}(\bB^{T}\bB)\|\btheta^{k_1}_{[\alpha]} - \btheta^{k_2}_{[\alpha]}\|^2 \le \|\btheta^{k_1} - \btheta^{k_2}\|^2 \le \lambda_{\max}(\bB^{T}\bB)\|\btheta^{k_1}_{[\alpha]} - \btheta^{k_2}_{[\alpha]}\|^2
\end{align}
where $\lambda_{\min}(\bB^{T}\bB)$ and $\lambda_{\max}(\bB^{T}\bB)$ respectively denotes the minimum and maximum non-zero eigenvalues of the positive semi-definite matrix $\bB^{T}\bB$. 

\begin{lem}
\label{lem:lb1}
$
\frac{1}{6}(b-a)^2\alpha\beta^2\delta^2 \le \|\btheta^{k_1}_{[\alpha]} - \btheta^{k_2}_{[\alpha]}\|^2 \le \frac{7}{6}(b-a)^2\alpha\beta^2\delta^2$, for all $k_1,k_2 \in [N]\times [N]$, with probability at least $(1 - N^2e^{-\frac{\alpha}{32}})$.
\end{lem}

\begin{proof}
Firstly we note that $\|\btheta^{k_1}_{[\alpha]} - \btheta^{k_2}_{[\alpha]}\|^2 = \sum_{i = 1}^{\alpha}(\theta_i^{k_1} - \theta_i^{k_2})^2$ and
for any $i \in [\alpha]$, $(\theta_i^{k_1} - \theta_i^{k_2})^2 = (b-a)^2(X_i^{k_1} - X_i^{k_2})^2$ and $\bE[(X_i^{k_1} - X_i^{k_2})^2]  = \frac{2}{3}\beta^2\delta^2$.
Now applying Hoeffding's inequality we have that
\[
\bP\Big(|\sum_{i = 1}^{\alpha}(X_i^{k_1} - X_i^{k_2})^2 - \frac{2}{3}\alpha\beta^2\delta^2| \ge \frac{1}{2}\alpha\beta^2\delta^2\Big) \le 2e^{-\frac{\alpha}{32}},
\]
for any fixed $k_1,k_2 \in [N]\times [N]$, and applying union bounding above holds true for all $\binom{N}{2}$  $(k_1,k_2)$ pairs with probability $N(N-1)e^{-\frac{\alpha}{32}} \le N^2e^{-\frac{\alpha}{32}}$.
Now for any $N < e^{\frac{\alpha}{64}}$, we have $N^2e^{-\frac{\alpha}{32}} < 1$ for all $\alpha > 0$, and hence with some non-zero probability of atleast $(1 - N^2e^{-\frac{\alpha}{32}}) > 0$, we have 
\[
\frac{1}{6}\alpha\beta^2\delta^2 \le \sum_{i = 1}^{\alpha}(X_i^{k_1} - X_i^{k_2})^2 \le \frac{7}{6}\alpha\beta^2\delta^2, ~~\forall k_1,k_2 \in [N]\times[N].
\]

Combining above we get, 
\[
\frac{1}{6}(b-a)^2\alpha\beta^2\delta^2 \le \|\btheta^{k_1}_{[\alpha]} - \btheta^{k_2}_{[\alpha]}\|^2 \le \frac{7}{6}(b-a)^2\alpha\beta^2\delta^2,
\]
for all $k_1,k_2 \in [N]\times [N]$, with probability at least $(1 - N^2e^{-\frac{\alpha}{32}})$.
\end{proof}

For convenience let us fix $N = e^{\frac{\alpha}{128}}$. Thus using Lemma \ref{lem:lb1} on \eqref{eq:lb0}, we get
\begin{align}
\nonumber \frac{\lambda_{\min}(\bB^{T}\bB)}{6}(b-a)^2\alpha\beta^2\delta^2 \le \|\btheta^{k_1} - \btheta^{k_2}\|^2 \le \frac{7\lambda_{\max}(\bB^{T}\bB)}{6}(b-a)^2\alpha\beta^2\delta^2,
\end{align}
with probability at least $(1 - e^{-\frac{\alpha}{64}})$.
Now setting $\beta = \frac{\sqrt{6}}{(b-a)\sqrt{\alpha\lambda_{\min}(\bB^{T}\bB)}}$ in above, we get 
\begin{align}
\label{eq:lb2} 
\delta^2 \le \|\btheta^{k_1} - \btheta^{k_2}\|^2 \le \frac{7\lambda_{\max}(\bB^{T}\bB)}{\lambda_{\min}(\bB^{T}\bB)}\delta^2, \text{ with probability at least } (1 - e^{-\frac{\alpha}{64}})
\end{align}

\begin{lem}
\label{lem:lb2}
Given any two $\btheta, \btheta' \in [a,b]^{n}$, such that $0 < a < b < 1$, we have
\[
\|e^{\btheta} - e^{\btheta'}\|^2 \le e^{2(b+1)}\|\btheta - \btheta'\|^2 
\]
\end{lem}

\begin{proof}
The proof follows from the following straightforward deduction:
\begin{align*}
\|e^{\btheta} - e^{\btheta'}\|^2 & = \sum_{i = 1}^{n}(e^{\theta_i} - e^{\theta'_i})^2 = \sum_{i = 1}^{n}(e^{\theta'_i})^2(e^{\theta_i - \theta'_i} - 1)^2\\
& \le \sum_{i = 1}^{n}e^{2b}(e^{\theta_i - \theta'_i} - 1)^2 
\le e^{2b}\sum_{i = 1}^{n}(({\theta_i - \theta'_i})(e-1))^2\\
& \le  e^{2(b+1)}\|\btheta - \btheta'\|^2, 
\end{align*}
where the second last inequality follows from the fact that $-1 < \theta_i - \theta'_i < 1$, for all $i \in [n]$.
\end{proof}

We will now assume our constructed score vectors, $\btheta^{k}$, indeed satisfy $0<a<\theta^k_i<b<1$, $\forall i \in [n], \forall k \in [N]$. We will shortly show this is indeed true by our construction of $\btheta^k$. Then applying Lemma \ref{lem:lb2} and subsequently \ref{lem:lb1} to \eqref{eq:lb1} we further get,

\begin{align}
\label{eq:lb3}
\bE[\| \bhtheta - \btheta \|] \ge \frac{\delta}{2}\bigg[ 1 - \frac{{448e^{2(b+1)}K\zeta\Lambda}{\delta^2} + 128\log 2}{\alpha} \bigg],
\end{align}
where $\Lambda = \frac{\lambda_{\max}(\bB^{T}\bB)}{\lambda_{\min}(\bB^{T}\bB)}$, and $N = e^{\frac{\alpha}{128}}$. 

Thus setting $\delta = \frac{\sqrt \alpha}{4\sqrt{448 \zeta K \Lambda e^{2(b+1)}}}$, we have that  
\[
{448e^{2(b+1)}K\zeta\Lambda}{\delta^2} + 128\log 2 \le \frac{\alpha}{2}, \text{ for any } \alpha \ge 512\log2,
\]
using which in \eqref{eq:lb3} further gives
\[
\bE[\| \bhtheta - \btheta \|] \ge \frac{\delta}{4} = \frac{\sqrt \alpha}{16\sqrt{448 \zeta K \Lambda e^{2(b+1)}}} = \frac{\sqrt{\alpha\lambda_{\min}(\bB^{T}\bB)}}{16\sqrt{448 \zeta K \lambda_{\max}(\bB^{T}\bB) e^{2(b+1)}}}.
\]

Finally, the only thing left to show is 
that indeed in the above construction of the score vectors $\btheta^k$ lies in the set $\Theta_{B}(a,b), ~\forall k \in [N]$. 
Note that if we can show $X^k_i \in [0,1], \forall i \in [\alpha]$, then that immediately implies $\theta^k_i \in [a,b], \forall i \in [n]$ by our construction of $\btheta^k$ and the assumption on the coefficient matrix $\bB \in \bR_{+}^{n \times \alpha}$ such that $\sum_{j = 1}^{\alpha}B_{ij} =1 , ~\forall i \in [n]$.

Now we have $\Big(\frac{1}{2} - {\beta\delta}\Big) \le X^k_{i} \le \Big(\frac{1}{2} + {\beta\delta}\Big), ~\forall i \in [n]$ and $k \in [N]$. And with $\beta = \frac{\sqrt{6}}{(b-a)\sqrt{\alpha\lambda_{\min}(\bB^{T}\bB)}}$ and $\delta = \frac{\sqrt {\alpha \lambda_{\min}(\bB^{T}\bB)}}{4\sqrt{448 \zeta K \lambda_{\max}(\bB^{T}\bB) e^{2(b+1)}}}$, we have 
$$
\beta\delta = \frac{6}{4(b-a)\sqrt{448 \zeta K \lambda_{\max}(\bB^{T}\bB) e^{2(b+1)}}} < \frac{1}{2}. 
$$
Hence $0 \le X^k_i \le 1, ~\forall i \in [n]$ and indeed we have $\btheta^k \in \Theta_{\bB}(a,b), ~\forall k \in [N]$.
%for suitable choice of $a,b$ and $\zeta$.
The desired  lower bound now follows as: 
\[
\frac{\bE[\| \bhtheta - \btheta \|]}{\|\btheta\|} \ge \frac{\sqrt{\lambda_{\min}(\bB^{T}\bB)}}{16b\lambda_{\max}(\bB^{T}\bB)\sqrt{448 \zeta K e^{2(b+1)}}},
\]
since $\|\btheta\| \le \sqrt{\lambda_{\max}(\bB^{T}\bB)}\|\btheta_{[\alpha]}\| \le b\sqrt{\lambda_{\max}(\bB^{T}\bB)\alpha}$.

\end{proof}


\iffalse&%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Conclusion and Future Works}
\label{app:concl}

We introduce a feature based probabilistic preference model, f-BTL, and proposed a least squares based algorithm, \fbtl, which is shown to achieve much tighter sample complexity bounds for the problem of ranking from pairwise comparisons in presence of feature information.
%We have proposed a least squares based algorithm and have shown theoretical recovery guarantees for the same. 
%Furthermore, we derive an information theoretic lower bound for the problem that shows optimality of our proposed algorithm with a matching lower bound guarantee.
While least square based algorithms is a natural choice, it would be interesting to see how Markov chain based approaches, e.g. \emph{Rank Centrality} \cite{negahban+12} can be extended to accommodate feature information. One can also potentially consider the contextual setting introducing user features in addition. Analyzing the sample complexity for recovering partial ordering (e.g. top-K items) would be useful too.

\fi %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%