\vspace*{-10pt}

\section{Analysis: Known preferences}
\label{sec:no_noise}
\label{SEC:NO_NOISE}
\vspace*{-10pt}
We begin by analyzing the problem for the noiseless case where for every pair $(i,j)$ that is compared, we have access to the exact value for $P_{ij}$. This analysis will shed light into the structure of the problem which will be useful later to analyse the case when $P_{ij}$s are unknown and need to be estimated from its noisy observations (Section \ref{sec:noise}). Under this setting, the goal is to bound the number of samples $m$ needed to \emph{exactly} recover the score vector $\btheta$ where $\theta_i = \bw^T\bu_i ~\forall i \in [n]$. From Equation \ref{eqn:basis}, we have that $\bw^T\bu_i = \displaystyle \sum_{j \in \mathcal{I}(G)} B_{ji}\bw^T\bu_j$,
\begin{align}
\vspace{-5pt}
\label{eqn:scores}
\text{or equivalently,} ~~\theta_i = \displaystyle \sum_{j \in \mathcal{I}(G)} B_{ji}\theta_j ~~\forall i \in [n].
\end{align}
As we have access to $\bU$ and $\bB$, we only need to recover the scores of $\theta_j = \bw^T\bu_j ~\forall j \in [\alpha]$ so that the remaining scores can be computed using Equation \ref{eqn:scores}.
For a pair $(i,j)$, under the f-BTL model, the following holds:
\begin{equation}
\label{eqn:linear}
%\label{EQN:LINEAR}
    \displaystyle \sum_{k=1}^{\alpha}\gamma^{ij}_k \theta_k =  \sum_{k=1}^{\alpha}\gamma^{ij}_k(\bw^T\bu_k)= \log \Bigg( \frac{P_{ij}}{P_{ji}} \Bigg)
\end{equation}
where $\gamma_k^{ij} = B_{ik} - B_{jk}$. 
Note that, from \eqref{eqn:basis}, this clearly implies $\gamma_k^{ij} = 0$ if $k \notin N(i) \cup N(j)$ as both $B(i,k) = B(j,k) = 0$ in that case.
Eqn. \eqref{eqn:linear} shows that knowing $P_{ij}$ for any pair $(i,j)$ gives rise to a linear equation involving the score vectors corresponding to the items only in $\cI(G)$. Since the f-BTL model is invariant to constant shift of the score vector $\btheta$, we can w.l.o.g. assume that one of the item score to be $0$ (with appropriate shift). Thus to recover the item scores, we only need $\cI(G) - 1$ linearly independent equations of type Eqn. \eqref{eqn:linear} that can be used to solve for the scores of the items in $\cI(G)$, i.e $\{\theta_i\}_{i \in \cI(G)}$. However, if the coefficient $\gamma_k^{ij}$ is $0$ in \eqref{eqn:linear} corresponding to the pair/edge $(i,j)$, then it does not involve $\theta_k$.  \emph{Thus, the equations of the selected pairs should be such that each item in $\cI(G)$ appears in \emph{at least} one of the equations so that it can be solved for.} 

Thus our problem now is to compute the number of pairs needed to ensure that with high probability each item in $\cI(G)$ appears in at least one equation of the form of Equation \ref{eqn:linear}. To compute this number, we need to explicitly model the dependencies among features. We do this below and prove the necessary result using \emph{the Hall's marriage theorem}, a classical result from graph matching theory. We state the theorem below for convenience.
% (refer Section $2$).

\textbf{Hall's Marriage Theorem.} 
\cite{hall1935}
%\label{thm:halls}
Let $C = (A\cup A', E)$ be a finite bipartite graph and for any $S \subseteq A$, $N_C(S)$ denote the neighbours of $S$ in $A'$. Then $C$ admits a matching entirely covering $A$ if 
$|N_C(S)| \ge |S| ~~ \forall S \subseteq A.$


The bipartite graph $C = (A \cup A', \Delta)$ for our purpose is defined as follows: Set $A$ is just the set of items in the independent set i.e., $A = \cI(G)$. (Recall $\cI(G) = [\alpha]$). Set $A'$ consists of ${n \choose 2}$ nodes, each corresponding to an edge $(i,j)$. For an edge $(i,j)$, define 
\begin{equation}
\label{eqn:F}
    F_{ij} = \{ k \in \cI(G) : \gamma_k^{ij} \neq 0 \}
\end{equation}
Thus $F_{ij}$ is a subset of independent nodes $\cI(G)$ which are adjacent to at least either of item $i$ or $j$ (as otherwise $\gamma_k^{ij} = 0$, as argued above). Hence by observing the preference $P_{ij}$ of the pair $(i,j)$, we have an equation involving the items in $F_{ij}$. We define the edge set $\Delta$ such that an edge from node $k \in \cI(G)$ to an edge $(i,j)$ is present in the bipartite graph $C$ iff $k \in F_{ij}$. For any set of edges $M \subseteq {n \choose 2}$, define the reduced bipartite graph $C_M = (\cI(G) \cup M, \Delta_M)$ by restricting the $A'$ to $M$ and defining $\Delta_M$ correspondingly. (see Fig. \ref{fig:bi_grph}).

\begin{restatable}[]{thm}{hallseqs}
\label{thm:halls-equations}
\label{THM:HALLS-EQUATIONS}
Given a set of edges $M \subseteq {n \choose 2}$, the bipartite graph $C_M = (\cI(G) \cup M, \Delta_{M})$ admits a matching that covers $A$ iff the system of linear equations induced by edges admits a unique solutions.
\end{restatable}
\vspace*{-5pt}
Theorem \ref{thm:halls-equations}  gives us a novel way to analyse the number of pairs needed to obtain enough (linearly independent) equations to uniquely solve for the score vector $\btheta$. In particular, we only need to bound the probability that the Hall's marriage condition is not met to get a bound on the number of pairs needed. (This is since when the condition is met, a matching cover would give $\cI(G)$ linearly independent equations to solve for the base scores of items in $\cI(G)$, i.e $\{\theta_i\}_{i \in \cI(G)}$). 
Before we prove the result, we need the following definitions for a given set $M$. Let $M_k$ denote the neighbours of node $k$ in $C_M$. Let $c_{I}  =  | \displaystyle \cup_{k \in I} M_k |, d_{I} =  | \displaystyle \cap_{k \in I} M_k |$, $I \subseteq \cI(G)$. We now prove the main result of this section:

\begin{restatable}[\textbf{Bound On Error Probability}]{thm}{sampcomp}
\label{thm:sampcomp}
\label{THM:SAMPCOMP}
Given a relation graph $G$,feature matrix $\bU$, a set of pairs $M$ where $|M| = m$  generated according to the sampling model above (where each pair is chosen with probability $p$), and the exact preference probabilities $P_{ij} ~\forall (i,j) \in M$, the probability that the score vector $\btheta$ is same as that estimated score vector $\hat{\btheta}$ that is got by solving the equations obtained  is bounded by

\vspace*{-20pt}
\begin{align*}
\hspace*{-6pt}\bP(\hat{\btheta} \neq \btheta) \le \hspace*{-30pt}\displaystyle \sum_{q = 1}^{\min\{\alpha(G),d_{\max}(G)+1\}} \hspace*{-30pt}\sum_{I \subseteq \cI(G) | |I| = q}\hspace*{-10pt} {d_{I} \choose q-1} p^{q-1} (1-p)^{(c_{I} - (q-1))},
\end{align*}
\vspace*{-20pt}
\end{restatable}
$d_{\max}(G)$ being the maximum degree of $G$. 

%The complete proof of Thm. \ref{thm:sampcomp} is given in Appendix \ref{app:nonoise_thm}

%\iffalse %%%%%%%%%%%%%
\begin{proof}\textbf{(sketch)}
From Theorem \ref{thm:halls-equations} we have that one only fails to recover the true $\btheta$ if and only if the edge set $\Delta_{M}$ of the bipartite graph $C_M$ 
fails to cover $A$.
%every $(\alpha-1)$ subsets of $A$, which in turn implies that $\Delta_{M}$ must fail cover $A$ as well. 
Thus:
\begin{align*}
& \bP(\btheta \neq \bhtheta) = \bP(\{A \text{ is not covered by } C_M\})\\
& = \bP(\{\exists S' \subseteq A \mbox{ s.t. } |N_{C_M}(S')| < |S'|\}) \text{ (Hall's Marriage)}
\end{align*}

Now if we denote the event $F_i := \{\exists S' \subseteq A \mbox{ s.t. } |S'| = i \text{ and } S'\mbox{ is not covered by } C_M\}$, $\forall i \in [\alpha(G)]$, and recalling $A = [\alpha(G)]$, one can further show

\vspace{-20pt}
\begin{align}
\label{eq:sampcom_prf1_m}
\nonumber & \bP(\btheta \neq \bhtheta) = \bP(\{ \exists S' \subseteq A \mbox{ s.t.} ~|N_{C_M}(S')| < |S'|\}) \\
\nonumber & = \bP(F_1 \cup F_2 \cup F_3 \ldots F_{\alpha(G)}) = \bP(F_1)\\
& \hspace{10pt} + \bP(F_2 \cap F_1^c) + \ldots + \bP(F_{\alpha(G)} \cap F_{\alpha(G)-1}^c)
\end{align}

Assuming the pairwise node preferences are drawn according to the edges sampled from an Erd\H{o}s-R\'enyi random graph $\mathcal G(n,p)$ and applying Thm. \ref{thm:halls-equations} on the event $F_q \cap F_{q-1}^c$ for any $1\le q \le \alpha(G)$, we get:
\begin{align}
\label{eq:sampcom_prf2_m}
\nonumber & \bP(F_q \cap F_{q-1}^c) = \bP\big(\{\exists S' \subseteq A, |S'| = q, ~S' \text{ is not cover-}\\ 
\nonumber & \text{ed by } C_M \mbox{ and } \forall S'_1 \subset A, |S_1'| < q, ~S_1' \text{ is covered by } C_M\}) \\
&  \hspace{25pt} \le \sum_{I \subseteq \cI(G) \mid |I|=q}\binom{d_I}{q-1}p^{q-1}(1-p)^{c_I-q},
\vspace{-5pt}
\end{align}
\vspace{-1pt}
where the last inequality follows from the observation that for any $S' \subseteq A$, $|S'| = q$ if $S'$ is not covered by $C_M$ but all it subsets $S_1' \subset S'$ are, then $\mathcal{G}(n,p)$ must have sampled exactly $q-1$ edges from $\cap_{i \in S'}M_i$ and none from $\big( \cup_{i \in I}M_i \setminus \cap_{i \in I}M_i \big)$.
Combining \eqref{eq:sampcom_prf1_m} in \eqref{eq:sampcom_prf2_m}:

\vspace{-15pt}
\begin{align*}
\bP(\btheta & \neq \bhtheta) \le P(F_1) + \ldots + P(F_{\alpha(G)} \cap F_{\alpha(G)-1}^c) \\
& = \sum_{q = 1}^{\alpha(G)}\sum_{I \subseteq \cI(G) \mid |I|=q}\binom{d_I}{q-1}p^{q-1}(1-p)^{c_I-(q-1)},
\end{align*}
%\vspace{-5pt}
where we assume ${x \choose y} = 0$, if $x < y$. The result follows further noting that if $d_{\max}(G) < \alpha(G)$, then for any $I \subseteq [\alpha(G)]$ such that $|I| > (d_{\max}+1)$, then $d_{I} = 0$. The complete proof is given in Appendix \ref{app:nonoise_thm}.
\end{proof}
%\fi %%%%%%%%%%%%%%%%%%%%%

\begin{rem}
\emph{The above theorem gives us a way of choosing $p$ such that the  probability of not satisfying the Hall's condition (and hence not having enough equations to solve) can be bounded by a suitable value. As can be seen in the Theorem, the quantities of interest are $c_{I}$ and $d_{I}$ which capture the dependencies among the feature vectors of the nodes in the graph.}
\end{rem}

For several graphs, these quantities are easily computable, yielding the sample complexity bounds:

\begin{restatable}[\textbf{Sample Complexity for Common Graphs}]{thm}{sampcompeg}
\label{thm:sampcomp_eg}
\label{THM:SAMPCOMP_EG}
Under the settings of Theorem \ref{thm:sampcomp}, the sample complexity bounds for the following graphs are: 1. $m = O(n\log(\frac{n}{\delta}))$ for a {\it disconnected graph}, {\it star} graph, or {\it cycle}, 2. $m = O(\log(\frac{1}{\delta}))$ for a {\it clique}, 3. $m = O(r\log(\frac{r}{\delta}))$ for union of $r$ disconnected cliques.
 \end{restatable} 
 
\begin{proof}\textbf{(sketch)}
The results could be obtained by first deriving the exact expression of $\bP(\btheta \neq \bhtheta)$ for the specific graphs and solving for $p$ equating it to $\delta$. The required sample follows subsequently from the expected number of sampled edges $p\binom{n}{2}$. 
Eg., for  $r$-Disconnected Cliques: Say $G$ has $r \in [n]$ disconnected cliques, $G_1, G_2, \ldots G_r$, each with $d \in [n]$ edges (i.e. for each $k \in [r], ~|E(G_k)| = d$), assuming $n = rd$. Thus in this case $\alpha(G) = r$. Without loss of generality let $\cI(G) = \{1,2, \ldots r\}$. Then $\forall k \in [r]$, we have $M_k = \{(i,j) \mid (i,j) \in E(G_k)\} \cup \{(k,j) \mid j \in [n]\setminus \{k\}\}$. Thus $n_k = \binom{d}{2} + (r-1)$. Moreover note that $\forall I \subseteq [n]$, $|I| = 2$, $c_I = 2(\binom{d}{2} + (r-1)) - 1 = d(d-1) + (r-2)$, $d_I =  1$ and $|I| \ge 3$, $d_I = 0$. 

Then applying Theorem \ref{thm:sampcomp} and noting $d_{\max}(G) \le \lceil \frac{n}{r} \rceil$ one can get:
$
\bP(\btheta \neq \bhtheta) \le r^2(e^{-p(\binom{d}{2}+r-1)}).
$
Now solving $r^2(e^{-p(\binom{d}{2}+r-1)}) \le \delta$ this implies $p \ge \frac{1}{\binom{d}{2} + (r-1)}\log \Big( \frac{r^2}{\delta} \Big)$. Thus the expected number of edges (pairwise preferences) in the random graph required is atleast $p\binom{n}{2} = \frac{n(n-1)/2}{d(d-1)/2 + r-1}\log \Big( \frac{r^2}{\delta}\Big) \ge \frac{n(n-1)r^2}{n(n-r)+2r^2(r-1)} \log \Big( \frac{r^2}{\delta}\Big)\ge r \log \Big( \frac{r^2}{\delta} \Big)$, where the last inequality follows assuming $r < \frac{n}{\sqrt{2}}$. Moreover setting $d = 1$ and $d = n$, we can recover the for disconnected and complete graphs  respectively etc.
The derivation for all the cases are in Appendix \ref{app:egs}.
\end{proof}

 
%\begin{rem}
\textbf{Remark.}
Theorem \ref{thm:sampcomp_eg}
 captures the connection between the structure of the relation graph $G([n],E)$ (induced by the features) and the sample complexity for recovering the item scores $\btheta$, under f-BTL model. E.g., if the graph is a clique, then there is only one independent vector and we need only $O(1)$ pairwise samples; but for a disconnected graph, star or cycle where $\alpha = O(n)$, we recover the $O(n \log n)$ result for BTL model \cite{negahban+12}. Moreover, there are graphs (e.g. r-disconnected cliques where $\alpha = r$) where the sample complexity scale as $O(\alpha \log(\alpha)$ (independent to $n$). Thus we get significant improvement in the sample complexity by exploiting the structure of the features which \cite{niranjanRa17} fails to achieve. Sample complexities of few other graphs, e.g. regular graphs and trees are discussed in Appendix \ref{app:egs}.
%\end{rem} 


It is also worth noting that the main structural assumption we exploited in Theorem \ref{thm:sampcomp_eg} towards achieving the $O(\alpha \log \alpha)$ sample complexity is the low $\alpha$-dimensional embedding. %The graph theoretic interpretation of Eq1 and $\alpha$ being the independence number of the underlying relation graph can be generalized as described below without affecting our algorithms and results in Sec 4,5.
Indeed, for a more general overview of our graph theoretic problem framework in \eqref{eqn:basis}, could assume $\mathcal I(G)$ to be an index set of some basis items, where the set $ \{u_i \in \mathbb{R}^{\alpha} \mid {i \in \mathcal{I} (G) } \}$ represents a basis of the set of item features {$\bU$}. Further, to mimic \eqref{eqn:basis}, now we assume a corresponding coefficient matrix $\tilde{\mathbf B}$ s.t. $\bU = \tilde{\mathbf B} \bU_{\alpha}$, where $\bU_\alpha$ represents the ``basis matrix" with vectors in $\{\mathbf u_i \mid i \in \mathcal I(G)\}$ stacked in the columns of $\mathbf U_\alpha$. 
%
In fact, note we do not need the knowledge of $\bU_\alpha$ apriori: As given the true feature matrix $U$, we can derive one basis (by Gauss elimination or even Gram-Schmidt) that spans the feature space set {$\bU$}. This is precisely what we adapted for our real-data experiments in Section \ref{sec:expt_real}.