\vspace*{-5pt}
\section{Lower Bound}
\label{sec:lb}
\label{SEC:LB}
\vspace{-5pt}

In this section, we show how the achievable $\ell_2$-error rate of the fBTL-LS algorithm (Theorem \ref{thm:fbtlls}), compares to the minimax $\ell_2$-error rate possible, over the class of feature Bradley-Terry-Luce ({f-BTL}) model.
Theorem \ref{thm:lb} proves an information-theoretic lower bound for the $\ell_2$-error rate achievable by any learning algorithm for estimating the score parameters of the {f-BTL} model.
%Our derived lower bound guarantee is given below:

\begin{restatable}[\textbf{Lower Bound for estimating the parameters of f-BTL model}]{thm}{lb}
\label{thm:lb}
\label{THM:LB}
Let us consider the following set of score vectors $\Theta_{\bB}(a,b)$ of a f-BTL model defined with respect to the coefficient matrix $\bB$ and range parameters $~a,b>0$ such that:
%\vspace*{-5pt}
$%\begin{align*}
%\vspace*{-10pt}
{\bB}(a,b) = \{\theta \in \bR^n \mid \theta  \text{ satifies } \eqref{eqn:scores}, ~ |\theta_i| \le a 
 ~\forall i \in [\alpha], ~ |\theta_i| \ge b ~\forall i \in [n] \}.
%\vspace*{-10pt}
$%\end{align*}
%\vspace*{-15pt}

Now suppose the learner (an algorithm to estimate scores of a f-BTL model) 
is given access to noisy pairwise preferences 
%of a set $M \subseteq [\binom{n}{2}]$ item-pairs 
sampled according to a $\mathcal G(n,p)$ Erd\H{o}s-R\'enyi random graph with $p=\frac{\zeta}{n}$ for some $\zeta > 0$, such that $K$ independent noisy pairwise preferences are available for each sampled pair, generated according to some unknown f-BTL model in $\Theta_{\bB}(a,b)$. Then if $\bhtheta \in \bR^n$ be the learner's estimated f-BTL score vector based on the sampled pairwise preferences, upon which environment chooses a worst case true score vector $\btheta \in \Theta_{\bB}(a,b)$, then for any such learning algorithm one can show that
\[
\vspace{-2pt}
\sup_{\btheta \in \Theta_{\bB}(a,b)}\frac{\bE[\| \bhtheta - \btheta \|]}{\|\btheta\|} \ge \frac{\sqrt{\lambda_{\min}(\bB^{T}\bB)}}{16b\lambda_{\max}(\bB^{T}\bB)\sqrt{448 \zeta K e^{2(b+1)}}},
\vspace{-2pt}
\]
the expectation is over the randomness of the algorithm.% and the choice of the environment for selecting $\btheta \in \Theta_{\bB}(a,b)$.
\end{restatable}

Our proof technique uses a constructive argument to generate the score vectors $\btheta$ from a uniform distribution that respects the f-BTL model in the dynamic range $|\theta_i| \in [a,b], ~\forall i \in [n]$, and solves the stochastic inference problem into a multi-way hypothesis testing problem. The full proof is given in Appendix \ref{app:lb}.

\begin{proof}\textbf{(sketch)}
We solve the above problem reducing it to a multi-class hypothesis testing problem as follows: Consider we are given a set of $N$ score vectors $\{\btheta^1, \btheta^2, \ldots \btheta^{N}\} \subset \Theta_B(a,b)$ s.t. $\|\btheta^{k_1} - \btheta^{k_2}\| \ge \delta$, for any two score vectors $\btheta^{k_1},\btheta^{k_2}$ such that $k_1,k_2 \in [N]$. Then given the set of pairwise preferences generated by an unknown sore vector $\btheta = \btheta^{L}$ ($L$ being a random index selected uniformly $[N]$), the hypothesis testing task is to identify the index of the score vector $L$.

Now given any algorithm that predicts a score vector $\bhtheta$ based on the given set of pairwise preferences from the f-BTL model $\btheta^{L}$, sampled according to a $\mathcal G(n,p)$ Erd\H{o}s-R\'enyi random graph with $p=\frac{\zeta}{n}$ for some $\zeta > 0$, such that $K$ independent noisy pairwise preferences are available for each sampled pair, one natural way to estimate $L$ is by $\hat L = \arg \min_{k \in [N]}\| \bhtheta - \btheta^k \|$. Note that for $\hat L$ to be different that $L$, it has to be the case that $\| \bhtheta - \btheta \| \ge \frac{\delta}{2}$. Thus one can write
$
\bE[\| \bhtheta - \btheta \|] \ge \frac{\delta}{2}\bP(\hat L \neq L).
$
Further applying a similar information theoretic analysis as \cite{negahban+12}, one gets
$\bE[\| \bhtheta - \btheta \|] \ge \frac{\delta}{2}\bigg[ 1 - \frac{\frac{K\zeta}{2N^2}\sum_{k_1 \in [N]}\sum_{k_2 \in [N]}\|e^{\btheta^{k_1}} - e^{\btheta^{k_2}}\|^2 + \log 2}{\log N} \bigg]
$

Thus the remaining task is to construct a set of $N$ score vectors $\{\btheta^1, \btheta^2, \ldots \btheta^{N}\} \subset \Theta_B(a,b)$ which are well separated, so to get suitable bounds on the terms $\|e^{\btheta^{k_1}} - e^{\btheta^{k_2}}\|^2, ~\forall k_1,k_2 \in [N]$ to obtain the desired lower bound for which we carefully constructed the score vectors as follows:
For any $k \in [N]$, we construct the $k^{th}$ score vector $\theta^k$ set of the set of $N$ random score vectors as follows: 
\textbf{1.} Draw $\alpha$ many random variables $X_1^k, X_2^k, \ldots X_\alpha^k \sim \text{Unif}\Big[\Big(\frac{1}{2} - {\beta\delta}\Big),\Big(\frac{1}{2} + {\beta\delta}\Big)\Big]$, where $\beta$ is a constant to be adjusted later. 
\textbf{2.} Set $\theta^k_{i} = a + (b-a)X^k_{i} , ~\forall i \in [\alpha]$, $0 < a < b < 1$.
\textbf{3.} Consider the coefficient matrix $\bB \in \bR_{+}^{n \times \alpha}$ such that $\sum_{j = 1}^{\alpha}B_{ij} =1 , ~\forall i \in [n]$.
\textbf{4.} Set the remaining score vectors $\theta^k_{i}$ according to \eqref{eqn:scores} for all $i \in [n]\setminus [\alpha]$.
The claim now follows proving the following two lemmas
\begin{lem}
\label{lem:lb1}
$
\frac{1}{6}(b-a)^2\alpha\beta^2\delta^2 \le \|\btheta^{k_1}_{[\alpha]} - \btheta^{k_2}_{[\alpha]}\|^2 \le \frac{7}{6}(b-a)^2\alpha\beta^2\delta^2$, for all $k_1,k_2 \in [N]\times [N]$, with probability at least $(1 - N^2e^{-\frac{\alpha}{32}})$,
\end{lem}

\begin{lem}
\label{lem:lb2}
Given any two $\btheta, \btheta' \in [a,b]^{n}$, such that $0 < a < b < 1$
$
\|e^{\btheta} - e^{\btheta'}\|^2 \le e^{2(b+1)}\|\btheta - \btheta'\|^2 
$
\end{lem}
which combined with the above derived lower bound on $\bE[\| \bhtheta - \btheta \|]$ yields the result. The complete proof can be found in Appendix \ref{app:lb}.
\end{proof}

\begin{rem}
%{\color{red}How to claim this matches the upper bound $\alpha\log \alpha$ (Thm \ref{thm:fbtlls})??}
Since $m = {n \choose 2}p$, or equivalently $\zeta = pn =  O\big(\frac{m}{n}\big)$, the above bound suggests that the left hand side is bounded by a small constant upon observing $m = \alpha \log \alpha$ pairs for $K \ge \frac{n}{\alpha \log \alpha}$ -- which exactly matches our derived upper bound of Thm. \ref{thm:fbtlls} for any $n \ge \alpha \log \alpha \log n$, establishing tightness of the results.
\end{rem}