\vspace*{-10pt}

\section{General Case: Unknown preferences}
\label{sec:noise}
\label{SEC:NOISE}
\vspace*{-10pt}

In this section, we consider the original problem where we don't have access to the exact $P_{ij}$ values but only estimates of it available from the $K$ independent comparisons made. In this setting, we cannot expect to solve the linear equations exactly. We propose f-BTL, a least squares based algorithm, shown in Algorithm \ref{alg:fbtlls} to solve for the score vector. 
Let the graph induced by the edge set $M$ on the $n$ nodes be called the \emph{comparison graph}. The node-edge incidence matrix $\bQ \in \bR^{n \times m}$ used in the algorithm is $\bQ\bQ^{T}$ which is the standard unnormalized Laplacian of the comparison graph i.e., $\bL = \bQ\bQ^T= \bD- \bA$ where $\bD$ is the diagonal matrix of degrees and $\bA$ is the adjacency matrix. Algorithm \ref{alg:fbtlls} is motivated using the fact that when the true probabilities are known exactly, following holds:
\begin{equation}
\bQ^{T}\bB\bv = \by
\end{equation}
where $\forall (i,j) \in M, y_{ij} = \log\left (\frac{{P}_{ij}}{{P}_{ji}}\right) $ and where $\bv \in \bR^{\alpha}$ such that $v_i = \theta_i ~\forall i \in [\alpha]$, $\by = (y_{ij})_{(i,j) \in M} \in \bR^m$. Above relation simply follows as: $y_{ij} = \log\left (\frac{{P}_{ij}}{{P}_{ji}}\right) = \log \left ( \frac{e^{\theta_i}}{e^{\theta_j}}\right) = \theta_i - \theta_j, \, \forall i,j \in [n]$ by the property of f-BTL model (Section \ref{sec:prb_set}). But since only noisy estimates $\hat{\by}$ are available instead of true $\by$, we take a least squares approach. The details is described in Algorithm \ref{alg:fbtlls}.

\subsection{Connectivity} 
The results of \cite{borkarNi16} show the sample complexity for the least squares algorithm for standard BTL model depends on how well connected the comparison graph is. Precisely, this is measured w.r.t the second Eigenvalue of the Laplacian $\bL$ which is $0$ if and only if the comparison graph is disconnected. Thus when the comparison graph is disconnected, there is no way to recover the score vector in the standard BTL case. However, as we will see below, our analysis will depend on the least eigenvalue of the matrix $\tilde{\bQ}\tilde{\bQ}^{T}$ and not the Laplacian matrix. The important point to note here is that \emph{even if the comparison graph is disconnected, the fBTL-LS algorithm may still recover the score vector}. This is  because of the fact that the algorithm makes use of the matrix $B$ of coefficients to relate scores across possibly disconnected components in the comparison graph.

%\vspace*{-8pt}
\begin{algorithm}[h]
\begin{algorithmic}
\REQUIRE $G$, $\bU$, a set $M$ of $m$ pairs each compared $K$ times. 

\STATE {Compute $\bB$ from $\bU$} such that Equation \ref{eqn:basis} is satisfied for all $\bu_i, i\in [n]$.
\STATE Compute the node-edge incidence matrix $\bQ \in \bR^{n\times m}$ from $M$. Let $\tilde{\bQ} = \bB^T\bQ$
\STATE \hspace*{-5pt} Compute $\hat{P}_{ij} \hspace*{-4pt} = \hspace*{-4pt} \begin{cases} 
                        \textrm{fraction of times~} i \textrm{~beats~} j  ~\forall (i,j) \in M \\
                       0 \quad \forall (i,j) \notin M 
                        \end{cases}$
\STATE Compute $\bhy \in \bR^{m}$ where $\forall (i,j) \in M, \hat{y}_{ij} = \log\left (\frac{\hat{P}_{ij}}{\hat{P}_{ji}}\right) $
\STATE  {Solve~~} $ \bhv  = \arg \min_{\bx \in \bR^{\alpha}} \|\tilde{\bQ}^{T}\bx - \bhy \|$
\STATE Set $\hat{\theta}_i = \begin{cases} 
                        \hat{v}_i ~\forall i \in [\alpha] \\
                        \textrm{~compute using Equation~} \eqref{eqn:scores} ~\forall i \notin [\alpha] 
                        \end{cases}$
%\RETURN score vector ${\bhtheta}$  
\STATE {\bf return } score vector ${\bhtheta}$  
\vspace*{-1pt}
\end{algorithmic}
\caption{Algorithm: fBTL-LS}
\label{alg:fbtlls}
\end{algorithm}
%\vspace*{-8pt}

An example of this is shown in Figure \ref{fig:matrix}. Here $n=3$ and $M = \{(1,2), (1,3), (4,5)\}$ and $m = |M| = 3$.  The comparison graph as can be seen in the figure is disconnected. The nodes circled in red are assumed to be the independent set nodes. The exact relation between the feature vectors of the independent set i.e., $\{\bu_1,\bu_2\}$ and those not in the independent set i.e., $\{\bu_3,\bu_4, \bu_5\}$ are given by the matrix $\bB$ shown in the figure.  It can be verified for this example that the matrix $\bB^{T}\bL\bB$  (also shown in the figure) has non zero eigenvalues though the Laplacian is block diagonal (which happens iff the comparison graph is disconnected).

\vspace*{-15pt}
\hspace*{-0pt}
\begin{figure}[h]
\includegraphics[scale=0.12]{./Plots/matrix.jpeg} 
\vspace*{-5pt}
\caption{A disconnected comparison graph for which the $\bB^{T}\bL\bB$ has non-zero minimum eigenvalue}
\label{fig:matrix}
\end{figure}
\hspace*{-0pt}
\vspace*{-5pt}
%We now prove the main result:
\begin{restatable}[\textbf{Recovery Guarantee for fBTL-LS Algorithm}]{thm}{fbtlls}
\label{thm:fbtlls}
\label{THM:FBTLLS}
Let $M$ be a set of $m$ edges  generated as per the sampling model and let each pair in $M$ be compared $K$ times independently according to the f-BTL model. Then for any positive scalar $K \ge 6(1+e^{2b})^2\log n$, with probability at least $1 - \frac{2m}{n^3}$, the normalized $\ell_2$-error of Algorithm \ref{alg:fbtlls} satisfies
$$ \frac{ \|\bhtheta - \btheta\|}{\|\btheta\|} \le \frac{2}{a}\cdot \sqrt{\frac{\lambda_{\max}(\bB^{T}\bB)}{\lambda_{\min}(\bB^{T}\bB)}}\cdot \sqrt{\frac{m}{\alpha}}\cdot \frac{\sqrt{\lambda_n}}{\lambda_1},
$$
$\lambda_1 = \min\{\lambda>0 \mid \lambda \text{ is an eigen value of } \bB^{T}\bL\bB\}$, $\lambda_n = \lambda_{\max}(\bB^{T}\bL\bB)$. $\lambda_{\min}(\bB^{T}\bB)$ and $\lambda_{\max}(\bB^{T}\bB)$ respectively denotes the minimum and maximum non-zero eigenvalues of the positive semi-definite matrix $\bB^{T}\bB$.
 $a,b > 0$  denote the range of the f-BTL parameter such that $|\theta_i| \ge a, ~\forall i \in [\alpha]$ and $|\theta_i| \le b, ~\forall i \in [n]$.
\end{restatable}

\begin{proof}\textbf{(sketch)}
Let us denote the {\it reduced Laplacian} matrix by $\btL = \tbQ\tbQ^T = \bB^{T}\bQ\bQ^T\bB = \bB^{T}\bL\bB$ which is clearly positive semi-definite and has all non-negative eigenvalues. Let $f(\bx) = \|\tilde{\bQ}^{T}\bx - \hat{\by} \|^2$, then note that $\bhv = \arg \min_{\bx \in \bR^{\alpha}}f(\bx)$ in Algorithm \ref{alg:fbtlls} would satisfy the optimality condition $\nabla f(\bhv) = 0$ when
\begin{align}
\label{eq:prf_ls_1_m}
\tbQ \bhy = \tbQ\tbQ^T \bhv = \btL\bhv,
\end{align}
On the other hand, assuming $\bv \in \bR^{\alpha}$ s.t. $v_i = \theta_i, ~\forall i \in [\alpha]$ and $\by \in \bR^{m}$ be such that $y_{ij} = \log \bigg( \frac{P_{ij}}{P_{ji}} \bigg)$, we have $\bv = \arg \min_{\bx \in \bR^{\alpha}}\| \tbQ^T\bx - \by \|^2$ which gives
\begin{align}
\label{eq:prf_ls_2_m}
\tbQ \by = \btL \bv.
\end{align}
Above condition holds for any $i,j \in [n]$, $y_{ij} = \theta_i - \theta_j$, and so $\by = \bL^T\btheta = \bL^T\bB \bv = \tbQ^T \bv$, where the second equality holds due to \eqref{eqn:scores}. Combining \eqref{eq:prf_ls_1_m} and \eqref{eq:prf_ls_2_m} we get
$
\tbQ (\by - \bhy) = \btL (\bv - \bhv)
$
from which it can be shown that,
$ \lambda_{\min}(\btL\btL^T) \|\bv - \bhv\|^2 
 \le \lambda_{\max}(\tbQ^T\tbQ) \|\by - \bhy\|^2.
$
Noting $\lambda_{\max}(\tbQ^T\tbQ) = \lambda_{\max}(\tbQ\tbQ^T) = \lambda_n$ and $\lambda_{\min}(\btL\btL^T) = (\lambda_{\min}(\btL))^2 = (\lambda_{\min}\tbQ\tbQ^T)^2 = \lambda^2_1$ above further implies:
\begin{align}
\label{eq:prf_ls_4_m}
\|\bv - \bhv\| \le \frac{\|\by - \bhy\|\sqrt{\lambda_n}}{\lambda_1}.
\end{align}

Now in order to bound $\|\by - \bhy\| = \sqrt{\sum_{(i,j) \in E}(y_{ij} - \hat y_{ij})^2}$, we first note:
$
|y_{ij} - \hat y_{ij}| \le |(\log P_{ij}-\log \hat P_{ij})| + |(\log P_{ji}-\log \hat P_{ji})|.
$
Denoting $\nu_{ij} = |P_{ij}-\hat P_{ij}|$ and applying {\it Hoeffding's Inequality}:
\begin{align}
\label{eq:prf_ls_6_m}
\bP\Big( \nu_{ij} \ge \eta  \Big) = \bP\Big( |P_{ij}-\hat P_{ij}| \ge \eta  \Big) \le 2e^{-2\eta^2K}
\end{align}
As $|\theta_i| \le b, \forall i \in [n]$, we have $\frac{1}{1+e^{2b}} \le P_{ij} \le \frac{e^{2b}}{1+e^{2b}}, \forall i,j \in [n]$. Also as $K \ge 6(1+e^{2b})^2\log n$, using \eqref{eq:prf_ls_6_m},  
and further taking union bound over all pairs in $M$, we get with probability atleast $\big( 1-\frac{2m}{n^3} \big)$:
\begin{align}
\label{eq:prf_ls_6b_m}
\bP\bigg(\forall i,j \in [n], \nu_{ij} < \frac{P_{ij}}{2} \bigg) > \bigg( 1-\frac{2m}{n^3} \bigg).
\end{align}
Define $g: [0,1] \mapsto \bR$, such that $g(p) = \log(p), ~\forall p \in[0,1]$. Using Taylor's theorem, one can obtain a $p^* \in [P_{ij} - \nu_{ij},P_{ij} + \nu_{ij}]$ such that
\begin{align*}
& \log \hat P_{ij} = \log P_{ij} + \frac{1}{p^*}(\hat P_{ij} -  P_{ij}), \text{ or equivalently,}\\
& \frac{\log(\hat P_{ij}) - \log P_{ij}}{(\hat P_{ij} -  P_{ij})}  = \frac{1}{p^*} \le \frac{2}{P_{ij}},
\end{align*}
where the last  inequality follows from \eqref{eq:prf_ls_6b_m} with  probability at least $(1-\frac{2m}{n^3})$.
Furthermore, in the high probability event, as $|\hat{P}_{ij} - P_{ij}| < \frac{P_{ij}}{2}$, one can show
$\|\by - \bhy \| \le 2\sqrt{m} $. 
%where recall that $m = |E|$ is the total number of edges sampled.
%Again applying \eqref{eq:prf_ls_6} for any given arbitrary $\eta \le \frac{1}{2(1+e^{2b})^2}$, $K \ge 6(1+e^{2b})^2\log n$ and applying union bound over all $m$ pair of sampled edges $(i,j) \in M$, we get that
%\[
%\bP\big(\forall (i,j) \in M, ~\nu_{ij} \ge \eta\big) \le %2me^{-2\eta^2K}.
%\]
Using this to \eqref{eq:prf_ls_4_m} we get 
\begin{align}
\label{eq:prf_ls_7_m}
\|\bv - \bhv\| \le \frac{\|\by - \bhy\|\sqrt{\lambda_n}}{\lambda_1} \le \frac{2\sqrt{m\lambda_n}}{\lambda_1}
\end{align}
with probability at least $\big( 1 - \frac{1}{n} \big)$. The proof finally follows noting since $|\theta_i| \ge a, ~\forall i \in [\alpha]$, we have $\|\bv\| \ge a\sqrt{\alpha}$. Moreover, as $\btheta = \bB\bv$, $\|\btheta\| \ge \sqrt{\lambda_{\min}(\bB^{T}\bB)}\|\bv\| \ge a\sqrt{\alpha\lambda_{\min}(\bB^{T}\bB)}$.
On the other hand, $\bhtheta = \bB\bhv$ thus,
\[
\|\btheta - \bhtheta\| = \|\bB(\bv - \bhv)\| \le \sqrt{\lambda_{\max}(\bB^{T}\bB)}\|\bv - \bhv)\|. 
\]
Combining above observations with \eqref{eq:prf_ls_7_m} yields the desired bound.
The proof is given in Appendix \ref{app:noise_thm}.
\end{proof}

\begin{rem}\emph{
Thm. \ref{thm:fbtlls} shows that the normalized error is bounded by a product of $4$ terms. The first term $\frac{2}{a}$ can be treated as a constant that depends on the minimum score of the f-BTL model -- a sensitivity component of the error bound. The second term is the condition number of the feature coefficient matrix $\bB$ and captures how the features interact with each other. The third term depends on the number of pairs seen in $M$. When $|M| = m = \alpha \log \alpha$, this term becomes $\sqrt{\log \alpha}$. The fourth term grows depending on how many samples one sees as it depends on $L$ which is the Laplacian of the comparison graph. If both $\lambda_{n}$ and $\lambda_1$ are $O(\log \alpha)$, then the normalized error is a constant with probability at least $1 - poly(\frac{1}{n})$. Thus, the result essentially says that if one sees $O(\alpha \log \alpha )$ samples and $\bB$ is such that both $\lambda_1$ and $\lambda_n$ are $O(\log \alpha )$, then the normalized error is bounded by a small constant. 
%
Thus the $m$ in the numerator could be misleading, as one expects decreasing performance error with increasing $m$. However as explained above, combining the effect of all $m$-dependent factors including eigenvalues of $B'LB$,  the error bound on the right hand side decreases as $m$ scales as $O(\alpha \log \alpha )$.}
\end{rem}

\iffalse
\begin{rem}
{\color{red} a,b,d dependencies.}
%Sensitivity of Phat_ij: The dynamic range [a,b], a fixed upper and lower bound on the true scores (\theta_i) captures this. UB does depend on K and reduces as K increase -- justify. Also why does it increase with m?

% The theorem requires K to be at least greater than a certain quantity for the guarantees to hold which could be easily modified to make the dependency on K explicit through Eqn-(13).  
%Regarding m, it's appearance on the numerator could be misleading. But when one considers the combined effect of all m-dependent factors including eigenvalues of B'LB, the error decreases with increasing m --- when both the eigenvalues are O(\log\alpha), m = O(\alpha\log\alpha)  which is much lesser compared to O(n\logn) for BTL.
\end{rem}
\fi