% \documentclass{uai2022} % for initial submission
\documentclass[accepted]{uai2022} % after acceptance, for a revised
                                    % version; also before submission to see how
                                    % the non-anonymous paper would look like
%% There is a class option to choose the math font
% \documentclass[mathfont=ptmx]{uai2022} % ptmx math instead of Computer
                                         % Modern (has noticable issues)
% \documentclass[mathfont=newtx]{uai2022} % newtx fonts (improves upon
                                          % ptmx; less tested, no support)
% NOTE: Only keep *one* line above as appropriate, as it will be replaced
%       automatically for papers to be published. Do not make any other change
%       above this note for an accepted version.

%% Choose your variant of English; be consistent
\usepackage[american]{babel}
% \usepackage[british]{babel}

%% Some suggested packages, as needed: \usepackage{natbib} % has a nice set of
% citation styles and commands \bibliographystyle{plainnat}
% \renewcommand{\bibsection}{\subsubsection*{References}} If you use natbib
% package, activate the following three lines:
\usepackage[round]{natbib}
\bibliographystyle{plainnat}
% \bibliographystyle{apalike}
\renewcommand{\bibname}{refs}
\renewcommand{\bibsection}{\subsubsection*{\bibname}}

% \usepackage{mathtools} % amsmath with fixes and additions \usepackage{siunitx}
% % for proper typesetting of numbers and units
\usepackage{booktabs} % commands to create good-looking tables
\usepackage{tikz} % nice language for creating drawings and diagrams

%% Provided macros \smaller: Because the class footnote size is essentially
% LaTeX's \small, redefining \footnotesize, we provide the original
% \footnotesize using this macro. (Use only sparingly, e.g., in drawings, as it
% is quite small.)

\title{$\ell_{\infty}$-Bounds of the MLE in the BTL Model under General
Comparison Graphs}

% The standard author block has changed for UAI 2022 to provide more space for
% long author lists and allow for complex affiliations
%
% All author information is authomatically removed by the class for the
% anonymous submission version of your paper, so you can already add your
% information below.
%
% Add authors
\author[1]{\href{mailto:<wanshanl@andrew.cmu.edu>?Subject=Your UAI 2022
paper}{Wanshan~Li}{}}
\author[1]{\href{mailto:<sshrotri@andrew.cmu.edu>?Subject=Your UAI 2022
paper}{Shamindra~Shrotriya}{}}
\author[1]{\href{mailto:<arinaldo@andrew.cmu.edu>?Subject=Your UAI 2022
paper}{Alessandro~Rinaldo}{}}
% Add affiliations after the authors
\affil[1]{%
    Department of Statistics \& Data Science \\
    Carnegie Mellon University \\
    Pittsburgh, Pennsylvania, USA }

% Add colorbox Source:
% https://tex.stackexchange.com/questions/66154/how-to-construct-a-coloured-box-with-rounded-corners
\usepackage{tcolorbox}
\usepackage{float}

\usepackage{csquotes}
\newenvironment{itquote}
{\begin{quote}\itshape} {\end{quote}}

\newcommand{\btldima}{n}
\newcommand{\btldimb}{N_{\text{comp}}}
\newcommand{\btldimc}{\theta}
\newcommand{\btl}{BTL}

% Replace default LaTeX tt environment font ------------------------------------
% Source: https://twitter.com/xtimv/status/1434594253720760324
\usepackage{inconsolata}

% Add a forced line break inside a table cell Source:
% https://tex.stackexchange.com/a/176780/180731
\newcommand{\specialcell}[2][c]{%
  \begin{tabular}[#1]{@{}c@{}}#2\end{tabular}}

% Setting the default graphicspath
\usepackage{graphicx} %Loading the package
\graphicspath{{figures/}} %Setting the graphicspath

% Custom math macros -----------------------------------------------------------
% \usepackage{./00_custom_macros_01_proof_envs}
% \usepackage{./00_custom_macros_02_operators_commands}
% \usepackage{./00_custom_macros_03_symbols}
\usepackage{000_custom_macros_comb}

% Ensure that we reference the external document -------------------------------
% Use {xr} with {cleveref} so that we can cite across docs Source:
% https://tex.stackexchange.com/a/244272/180731
\usepackage{xr}
\usepackage{xr-hyper}
% \usepackage{hyperref}
% \usepackage{hyperref}       % hyperlinks
% \hypersetup{colorlinks = true, linkcolor = blue, anchorcolor = blue, citecolor =
%             blue, filecolor = blue, urlcolor = blue }
% \RequirePackage[breaklinks]{hyperref}[2019/11/10]       
\usepackage[breaklinks]{hyperref}[2019/11/10]

% Ensure that Figure ## is bold in figure captions
\usepackage[labelfont=bf]{caption}
% simple URL
\usepackage{url}
%nice reference package for automatically choosing names for references
\usepackage[capitalize,sort,compress]{cleveref}

% Link to the external supplementary document
\externaldocument[supp:]{li_649-supp}

% In your preamble

\makeatletter
\newcommand*{\addFileDependency}[1]{% argument=file name and extension
  \typeout{(#1)}
  \@addtofilelist{#1}
  \IfFileExists{#1}{}{\typeout{No file #1.}}
}
\makeatother

\newcommand*{\myexternaldocument}[1]{%
    \externaldocument{#1}%
    \addFileDependency{#1.tex}%
    \addFileDependency{#1.aux}%
}

\myexternaldocument{li_649-supp}

\begin{document}
\maketitle

\begin{abstract}
  The Bradley-Terry-Luce (\btl{}) model is a popular statistical approach for
  estimating the global ranking of a collection of items using pairwise
  comparisons. To ensure accurate ranking, it is essential to obtain precise
  estimates of the model parameters in the $\ell_{\infty}$-loss. The difficulty
  of this task depends crucially on the topology of the pairwise comparison
  graph over the given items. However, beyond very few well-studied cases, such
  as the complete and Erd\"os-R\'enyi comparison graphs, little is known about
  the performance of the maximum likelihood estimator (MLE) of the \btl{} model
  parameters in the $\ell_{\infty}$-loss under more general graph topologies.
  %
  In this paper, we derive novel, general upper bounds on the $\ell_{\infty}$
  estimation error of the \btl{} MLE that depend explicitly on the algebraic
  connectivity of the comparison graph, the maximal performance gap across items
  and the sample complexity. We demonstrate that the derived bounds perform well
  and in some cases are sharper compared to known results obtained using
  different loss functions and more restricted assumptions and graph topologies.
  We carefully compare our results to \citet{yan2012sparsecompbtl}, which is
  closest in spirit to our work. We further provide minimax lower bounds under
  $\ell_{\infty}$-error that nearly match the upper bounds over a class of
  sufficiently regular graph topologies. Finally, we study the implications of
  our $\ell_{\infty}$-bounds for efficient (offline) tournament design. We
  illustrate and discuss our findings through various examples and simulations.
\end{abstract}

% -----------------------------------------------------------------
\section{Introduction}\label{sec:introduction}

Simultaneous or `global' ranking of a set of items is a practical problem that
arises naturally in a variety of domains. For example, one may wish to ascertain
a `best player' or `best team' in a given sports league. Designing a principled
statistical approach to global ranking of items is challenging due to data
limitations and complex domain-specific relationships between the underlying
items to be ranked.

A popular and practicable solution  to estimating global ranking  is to utilize
pairwise comparison information across the items to be ranked, which is easily
accessible across many application domains. The \btl{} model
\citep{bradley1952rank,Luce59} is a popular statistical model for pairwise
comparison data. A similar model was also originally studied in
\cite{Zermelo1929}. The continued practical and theoretical interest in the
\btl{}  model stems from its relatively simple parametric form which provides a
good balance between interpretability and tractability for theoretical analysis.
The \btl{} model is domain-agnostic, making it an ideal benchmarking tool across
a variety of ranking applications \eg sports analytics
\citep{FaT1994,MaV2012,CMV2012}, and bibliometrics \citep{St1994, Va2016}.

Formally, we can describe the \btl{}  model as follows. Suppose that we have $n$
distinct items, each with a (fixed but unobserved) positive strength or
preference score $w^{*}_{i}$, $i \in [n]$, quantifying item $i$'s propensity to
beat other items in pairwise comparisons. The \btl{} model assumes that the
comparisons between different pairs are independent and the outcomes of
comparisons between any given pair, say item $i$ and item $j$, are \iid
Bernoulli random variables, with \textit{winning probability} $p_{ij}$, defined
as
\begin{equation}\label{neqn:bradley_terry_prob_succ}
  p_{ij}
  \defined \Prb{i \text{ beats } j}
  \defined \frac{w^{*}_{i}}{w^{*}_{i} + w^{*}_j}, \: \forall \; i,j \in [n].
\end{equation}
A common reparametrization is to set, for each $i$,  $w^{*}_{i} =
  \exp(\theta^*_i)$, where $\boldsymbol{\theta}^* \defined (\theta^*_{1},
  \ldots, \theta^*_{n})^{\top} \in \reals^{n}$. By convention, we assume that
$\sum_{i \in [n]} \theta^*_i = 0$ for parameter identifiability.

From a theoretical perspective, much attention in the \btl{} literature has been
paid to two popular estimators, namely the maximum likelihood estimator (MLE)
and the spectral method \citep{jain2020spectralmethodscarcedata}. Recently,
\cite{chen2020partialtopkranking} show that the MLE attains a sharper minimax
rate of the Hamming top-$k$ loss compared to the spectral method. In this paper,
we thus focus on the MLE, which we formally define later in
\Cref{sec:upper-bounds}.

\noindent{\bf General pairwise comparison
  graphs}\label{subsec:optimality-mle-gen-topology}

Given $n$ items to be compared, the pairwise comparison scheme among them can be
expressed through an undirected simple graph $\mclG(V, E)$, where the vertex set
$V \defined [n]$ and the edge set $E \defined \{(i,j): i \text{ and } j \text{
    are compared }\}$ is determined by the comparison scheme.  Correspondingly, if
we define the directed edge set as $E_d\defined \{(i,j,k): (i \text{ beats } j)
  \text{ $k$ times}\}$, then the induced directed simple graph $\mclG(V,E_d)$ is
called a \textit{directed} comparison graph. It is a classical result
\citep{ford1957,simons1999,hunter2004mm} that the \btl{} model is identifiable
if and only if $\mclG(V,E)$ is connected, and the MLE of the model parameters
exists and is consistent if and only if $\mclG(V,E_d)$ is strongly connected.
Henceforth, \textit{comparison graph} refers to the undirected pairwise
comparison graph.

Typically one is interested in getting sharp bounds for the estimation risk,
which could be based on a norm-induced metric $ \|\hat{\bbrtheta} -
  \bbrtheta^*\|_{p}$ or a ranking metric, \eg, Kendall’s tau distance
\citep{kendall1938}. What makes risk analysis of \btl{} model estimators
particularly challenging is a combination of the type of estimation risk loss
considered, and the assumptions on the topology of $\mclG(V,E)$.

\noindent{\bf Core questions of
  interest}\label{subsec:core-question-of-interest}

% \textcolor{red}{Below we provide a brief summary of some of the more recent
% theoretical contributions to the study of the properties of the \btl{} model
% in high-dimensional settings}.
Among all the metrics measuring uncertainty of estimators of \btl{} parameters,
the $\ell_\infty$-loss directly connects with ranking metrics, \eg binary and
Hamming top-$k$ (partial) ranking loss \citep[see,
  \eg][]{chen2019spectralregmletopk,chen2020partialtopkranking}.

It is thus natural to study the MLE for the \btl{} parameters in the
$\ell_{\infty}$-loss, to better understand the risk optimality of the MLE and
further justify its use for practical global and partial ranking problems. In
this spirit, \cite{yan2012sparsecompbtl} focus specifically on proving
$\ell_{\infty}$-error bounds for the \btl{} MLE for general comparison graphs.
However, a notable limitation in their setting is that they impose a strictly
dense comparison graph assumption, which may be impractical in many real world
applications. This leaves a gap in the literature, summarized in the following
questions:

\begin{tcolorbox}
  \begin{itquote}
    \textbf{Core questions:} For the \btl{} model, how does the MLE perform with
    respect to the $\ell_{\infty}$ loss, under much weaker assumptions on the
    pairwise comparison graph compared to \citet{yan2012sparsecompbtl}? That is,
    assuming only that the comparison graph is connected. Moreover, what are the
    implications of such bounds in applications?
  \end{itquote}
\end{tcolorbox}

Providing a sharp analysis to these questions with a detailed comparison to
recent theoretical results in the \btl{} literature motivates our work in this
paper.

\noindent{\bf Relevant and related literature}

We give a brief overview of the work that addresses the challenge of comparison
graph topology in ranking. When the comparison graph is a complete graph,
\cite{simons1999} give a high-probability upper bound for the $\ell_{\infty}$
loss, \ie, $\|\hat{\bbrtheta} - \bbrtheta^*\|_{\infty}$ and obtain the
asymptotic distribution of the MLE. In the setting where the comparison graph
follows the Erd\"os-R\'enyi graph model,
\cite{chen2015spectralmletopkpairwisecomparison},
\cite{chen2019spectralregmletopk}, \cite{chen2020partialtopkranking} and
\cite{han2020asymptoticsparsebradleyterry} derive high-probability upper bounds
for the $\ell_{\infty}$ loss. Moreover,
\cite{chen2019spectralregmletopk}
show that both MLE and spectral method are minimax optimal in terms of the
binary top-$k$ ranking loss, \ie, whether the items with the highest $k$ out of
$n$ preference scores are perfectly identified;
\cite{chen2020partialtopkranking} consider a Hamming Loss for top-$k$ items and
show that the MLE is minimax optimal compared to the spectral method with
differences arising in constant factors.

For a broader class of comparison graphs beyond complete and Erd\"os-R\'enyi
graph, researchers have studied the explicit dependence of the estimation risk
on graph topology. In particular, \cite{yan2012sparsecompbtl} give a
high-probability upper bound for the $\ell_{\infty}$-loss for relatively dense
graphs.
\cite{hajek2014minimaxinferencepartialrank,shah2015estimationfrompairwisecomps}
give a high probability upper bound for the $\ell_2$ or Euclidean loss
$\|\hat{\bbrtheta} - \bbrtheta^*\|_{2}$, establish upper and lower bounds of
$\mbbE{\|\hat{\bbrtheta} - \bbrtheta^*\|_{2}}$ and show the minimax optimality
of the constraint MLE across a wide range of graph topologies. Recently,
\cite{agarwal2018acceleratedspectralranking} give sharp upper bounds for a novel
spectral method in the $\ell_1$-loss $\|\hat{\bbrpi} - \bbrpi^*\|_{1}$ for
$\bbrpi^* = \bfw^*/\|\bfw^*\|_1$ instead of $\bbrtheta^*$.
\cite{hendrickx2019graphresistance, hendrickx2020minimaxpairwisebtl} propose a
weighted least square method to estimate $\bfw^*$ and prove a sharp upper bound
for their estimator in $\mathbb{E}[\sin^2(\hat{\bfw},\bfw^*)]$ or equivalently
in $\mathbb{E}\|\hat{\bfw}/\|\hat{\bfw}\|_2 - \bfw^*/\|\bfw^*\|_2\|^2_2$, in the
sense that this upper bound matches a instance-wise lower bound up to constant
factors. \looseness=-1

\noindent{\bf Contributions}

Our contributions in this paper are fourfold and are summarized as follows:
\bitems
\item \textbf{Upper bounds:} We derive a novel upper bound for the
$\ell_{\infty}$-error of the regularized MLE in \btl{} model allowing for
general graph topology. Our upper bounds hold under minimal assumptions on graph
topologies, \ie, assuming only that the comparison graph is connected. Given
such generality, we show our $\ell_\infty$ bound is tighter than existing
results under a broad range of graph topologies, and works well in general. In
particular, we carefully compare our work analytically and in simulation to
\citet{yan2012sparsecompbtl}, which is closest in spirit to our work.

A minor corollary of our techniques results in the state of the art
$\ell_{2}$-loss bounds for the Erd\"os-R\'enyi graph.
\item \textbf{Lower bounds:} We derive minimax lower bounds for \btl{} parameter
estimation in $\ell_{\infty}$-loss. We analyze specific graph topologies
satisfying certain regularity connectivity conditions under which the \btl{} MLE
is nearly minimax optimal.
\item \textbf{Implications for tournament design:} We show that the \btl{} MLE
in $\ell_{\infty}$-loss satisfies a unique subadditivity property, and how our
$\ell_{\infty}$ bounds can exploit this property for efficient (offline)
tournament design.
\item \textbf{Extension to the unregularized \btl{} model:} We also extend our
upper bounds under $\ell_{\infty}$-loss to the unregularized (`vanilla') \btl{}
MLE, which is also frequently used in practice.
\eitems
Due to the more complicated form of the vanilla \btl{} MLE upper bounds and
space limitations, we present these analagous results and their proofs
separately in \Cref{supp:sec:vanilla-mle}. Henceforth, MLE refers to the
regularized \btl{} MLE unless stated otherwise. In addition to our theoretical
contributions a core aspect throughout our paper is to emphasize the
interpretability of our results, the associated assumptions, and implications
for practical ranking tasks.

\noindent {\bf Organization of the paper}

The rest of the paper is organized as follows. In \Cref{sec:upper-bounds}, we
present our main results for the upper bound in \Cref{nthm:thm1} and an
interpretation of the key components of the bound. In \Cref{sec:lower-bounds},
we discuss minimax lower bounds using the $\ell_{\infty}$ risk loss in
\Cref{nthm:thm2-lb}. In \Cref{sec:implications-of-work}, we show some practical
implications of our results in efficient tournament design from a ranking
perspective. In \Cref{sec:simulations}, we conduct extensive numerical
simulations to validate the optimality of our bounds compared to related results
in the literature.

\noindent {\bf Notation}

We typically use lowercase for scalars, \eg, $(x, y, z, \ldots)$, boldface
lowercase for vectors, \eg, $(\bfx, \bfy, \bfz, \ldots)$, and boldface uppercase
for matrices, \eg $(\bfX, \bfY, \bfZ, \ldots)$. We denote the finite set
$\theseta{1, \ldots, n}$ by $[n]$. For asymptotics, we denote $x_n\lesssim y_n$
or $x_n = O(y_n)$ and $u_n\gtrsim v_n$ or $u_n = \Omega(v_n)$ if $\forall n$,
$x_n \leq c_1 y_n$ and $u_n\geq c_2 v_n$ for some constants $c_1,c_2>0$. We
denote $\bfe_i$ as a vector whose entries are all $0$ except that the $i$-th
entry is $1$. $a_n = o(b_n)$ means $a_n/b_n\rightarrow 0$ as $n\rightarrow
  \infty$ and conversely, $a_n = \omega(b_n)$ means $b_n/a_n\rightarrow 0$ as
$n\rightarrow \infty$. We denote $\textbf{1}_n \in \reals^{n}$ to be a vector of
ones.

\section{Upper bounds}\label{sec:upper-bounds}

Recall that given $n$ items to be compared, the comparison scheme among them
defines the comparison graph $\mclG(V, E)$, where $V = [n]$ and $E = \{(i,j): i
  \text{ and } j \text{ are compared }\}$. We denote the corresponding adjacency
matrix as $A\in \mathbb{R}^{n\times n}$, and its $(i,j)^{\text{th}}$ entry is
$A_{ij} \defined 1\{(i,j)\in E\}$. The associated (unnormalized) graph Laplacian
is the symmetric, positive-semidefinite matrix   $\mclL_{\bfA}\defined\bfD -
  \bfA$, where $\bfD = \mathrm{diag}(n_1,\ldots, n_n)$, with
$n_{i}\defined\sum_{j=1}^n A_{ij}$ the degree of node $i$. It is well known that
the smallest eigenvalue of $\mclL_{A}$ is $0$ with an eigenvector
$\textbf{1}_n$. Let $\lambda_2(\mclL_\bfA)$ be the second smallest eigenvalue of
$\mclL_\bfA$, known as the algebraic connectivity of $\mathcal{G}$
\citep{laplacian2004}, then $\mclG$ is connected if and only if
$\lambda_2(\mclL_\bfA) >0$. Following the standard in the \btl{} literature we
assume a that for each edge $(i,j)$ of the comparison graph, the corresponding
items $i$ and $j$ are compared $L$ times, each leading to an independent outcome
$y_{ij}^{(l)}\in \{0,1\}$, where $l \in [L]$. If pairs are compared different
number of times, we take $L$ to be the smallest number of pairwise comparisons
over the edge set, as a worst-case scenario. The corresponding sample averages
are denoted with $\bar{y}_{ij} = \frac{1}{L}\sum_{l = 1}^L y_{ij}^{(l)}$ and are
sufficient statistics for the model parameters. The $\ell_{2}$-regularized MLE
is defined as
\begin{equation}\label{eq:reg.mle}
  \hat{\bbrtheta}_{\rho} = \argmin_{{\bf 1}_n^\top \bbrtheta = 0} \ell_{\rho}(\bbrtheta;\bfy),\  \ell_{\rho}(\bbrtheta;\bfy) =\ell(\bbrtheta;\bfy) + \frac{\rho}{2} \|\bbrtheta\|_2^2,
\end{equation}
where $\ell(\bbrtheta;\bfy)$ is the negative log-likelihood, given by
\begin{equation}
  \begin{split}
    \ell(\bbrtheta;\bfy) \defined - &\sum_{1\leq i<j\leq n} A_{ij}\lbrace\bar{y}_{ij}\log {\psi(\theta_i - \theta_j)} \\
    &+ (1 - \bar{y}_{ij})\log [{1 - \psi(\theta_i - \theta_j)}]\rbrace,
  \end{split}
  \label{eq:reg_loglikelihood}
\end{equation}
and $t \in \mathbb{R} \mapsto \psi(t) = {1}/{[1 + e^{-t}}]$ the sigmoid
function.

Under this notational setup, we are ready to state the $\ell_{\infty}$ upper
bound of the \btl{} MLE in \Cref{nthm:thm1}.

\bnthm \label{nthm:thm1} Assume the \btl{} model with parameter $\bbrtheta^* =
  (\theta^*_1,\ldots,\theta^*_n)^\top$ such that ${\bf 1}_n^\top \bbrtheta^* =0$
and a comparison graph $\mathcal{G} = \mclG([n],E)$ with adjacency matrix
${\bf A}$, algebraic connectivity $\lambda_2(\mclL_\bfA)$ and maximum and
minimum degrees $n_{\max}$ and $n_{\min}$. Suppose that each pair of items
$(i,j)\in E$ are compared $L$ times. Let $\kappa = \max_{i,j}|\theta^*_i -
  \theta^*_j|$ and $\kappa_E = \max_{(i,j)\in E}|\theta^*_i - \theta^*_j|$ and
set $\rho \geq c_{\rho}\kappa^{-2}e^{-2.5\kappa_E}n^{-4}n_{\max}^{1/2}$.
Assume that $\mclG$ is connected or $\lambda_2(\mclL_A) >0$. Then with
probability at least $1 - O(n^{-4})$, the regularized MLE
$\hat{\bbrtheta}_{\rho}$ from \eqref{eq:reg.mle} satisfies
\begin{align}
  \|\hat{\bbrtheta}_{\rho} - \bbrtheta^*\|_{\infty} \lesssim & \frac{e^{2\kappa_E}}{\lambda_2} \frac{n_{\max}}{n_{\min}}\parens{\sqrt{\frac{n+r}{L}} + \rho \kappa\sqrt{\frac{n}{n_{\max}}}} \nonumber \\
                                                             & + \frac{e^{\kappa_E}}{\lambda_2} \sqrt{\frac{n_{\max} (\log n + r)}{L}},
  \label{eq:thm1}                                                                                                                                                                                      \\
  \|\hat{\bbrtheta}_{\rho} - \bbrtheta^*\|_2 \lesssim        & \frac{e^{\kappa_E}}{\lambda_2} \parens{ \sqrt{\frac{n_{\max}(n+r)}{L}} + \rho\kappa \sqrt{n}}
  \label{eq:l2.rate}
\end{align}
where $\lambda_2 = \lambda_2(\mclL_\bfA)$, $r \defined \kappa_E + \log \kappa$
provided that $L\leq n^8e^{5\kappa_E}\max\{1,\kappa\}$, and $L$ is large enough
so that the right hand side of \Cref{eq:thm1} is smaller than a sufficiently
small constant $C>0$. In particular, if we set $\rho = c_\rho
  /{\kappa}\sqrt{{n_{\max}}/{L}}$ for some $c_\rho>0$, then
\begin{align}
  \|\hat{\bbrtheta}_{\rho} - \bbrtheta^*\|_{\infty} \lesssim & \frac{e^{2\kappa_E}}{\lambda_2} \frac{n_{\max}}{n_{\min}}\sqrt{\frac{n + r}{L}} + \frac{e^{\kappa_E}}{\lambda_2} \sqrt{\frac{n_{\max} (\log n + r)}{L}}, \nonumber \\
  \|\hat{\bbrtheta}_{\rho} - \bbrtheta^*\|_2 \lesssim        & \frac{e^{\kappa_E}}{\lambda_2} \sqrt{\frac{n_{\max}(n+r)}{L}}.
  \label{eq:thm1_optimal}
\end{align}
\enthm
As a brief sketch, the proof is based on a gradient descent procedure
initialized at $\bbrtheta^{(0)} = \bbrtheta^*$ and the idea is to control
$\|\bbrtheta^{(T)} - \hat{\bbrtheta}_{\rho}\|_{\infty}$ using the linear
convergence property and $\|\bbrtheta^{(T)} - {\bbrtheta^*}\|_{\infty}$ using
the leave-one-out technique in \cite{chen2019spectralregmletopk} and
\cite{chen2020partialtopkranking}. In fact, our work confirms that such a line
of argument extends to more general graph topologies beyond the Erd\"os-R\'enyi
graph, which is non-trivial. The proof details can be found in
\Cref{sec:prf_upper_bounds}.

\noindent {\bf Interpretation of key terms}

The upper bound in \Cref{eq:thm1} contains several distinct terms, which
interact with each other in non-trivial ways and express different aspects of
the intrinsic difficulty of the estimation task.
\begin{itemize}
  \item
        The factor  $\frac{e^{\kappa_E}}{\lambda_2(\mclL_{\bfA})}$ combines two
        sources of statistical hardness: the \textit{maximal gap} in performance
        $\kappa_E$ among the ranked items over the edge set $E$, and the
        \textit{algebraic connectivity} $\lambda_2(\mclL_{\bfA})$ of the
        comparison graph. It is intuitively clear that the larger the
        performance gap among the compared items, the more difficult it is to
        accurately estimate the model parameters. Furthermore, the smaller the
        algebraic connectivity, the less connected the comparison graph is, due
        to the presence of bottlenecks\footnote{Here, bottlenecks can be
          formally described as small connected subgraphs with very few edges
          separating dense portions of the graph.}. This in turn will increase the
        chance of obtaining a highly erroneous ranking or of gathering data from
        which a global ranking cannot be elicited at all. The minimal and
        maximal degrees $n_{\min}$ and $n_{\max}$ further quantify the impact of
        the connectivity of the comparison graph.
  \item We note that the factor $\frac{1}{\lambda_2(\mclL_{\bfA})}$ can be
        equivalently replaced with $\frac{1}{\lambda_2(\mclI)}$ (see
        \Cref{lm:lem8} in \Cref{sec:prf_upper_bounds}). Here, $\mclI
          \defined \nabla^2 \ell_0(\bbrtheta^*;\bfy)$ is the Fisher information
        matrix at $\bbrtheta^*$ and $\lambda_2(\mclI)$ its smallest  non-zero
        eigenvalue. The fact that the bound depends on the Fisher information is
        not too surprising. This is so, since this quantity in exponential
        families quantifies the curvature of the likelihood and the intrinsic
        difficulty of estimating $\bbrtheta^*$.
  \item Our bounds depend on both $\kappa$ and $\kappa_E$, which is non-standard
        in the literature. By definition, $\kappa_E \leq \kappa$ and in many
        cases, $\kappa_E$ can be much smaller than $\kappa$. We discuss this
        further in \Cref{sec:simulations}.
  \item The term $r \defined \kappa_E  +\log\kappa$ shows the impact of large
        $\kappa$ and $\kappa_E$. When $\kappa\lesssim n$ and $\kappa_E\lesssim
          \log n$, $r$ is negligible. We will consider this parameter range
        throughout the paper unless stated otherwise.
  \item The term $\sqrt{\frac{n}{L}}$ describes explicitly the impact of a
        high-dimensional parameter space on the estimation problem in relation
        to $L$, the number of samples for each comparison, which can be thought
        of as a measure of the sample size required for each of the $n$
        parameters. The inverse root dependence on $L$ is to be expected and, we
        conjecture, not improvable.
\end{itemize}

\bnrmk\label{nrmk:thm1-conditions} In the case of dense graphs, \eg, complete
graphs, $\lambda_2(\mclL_{\bfA})$ is large enough so that even $L = 1$ will
ensure a consistent estimator as $n\rightarrow \infty$. But for sparse graphs,
$L$ needs to be larger to compensate for weaker connectivity. The assumption
that $L\leq n^8e^{5\kappa_E}\max\{1,\kappa\}$ is a technical condition. There is
nothing special in the exponent for $n$. Any fixed number larger than $8$ can be
used which will only affect the constants in the bounds. The condition $L \leq
  n^8e^{5\kappa_E}\max\{1,\kappa\}$ may seem counter-intuitive, since it places an
upper bound on the sample size. But a control over $L$ is needed because as $L$
gets larger, the optimal choice of the regularization parameter $\rho = c_\rho
  \frac{1}{\kappa}\sqrt{\frac{n_{\max}}{L}}$ gets smaller and, accordingly, the
convergence rate of the gradient descent procedure upon which our proof is based
degrades. The optimal choice $\rho =
  c_{\rho}\frac{1}{\kappa}\sqrt{\frac{n_{\max}}{L}}$ depends on $\kappa$, which is
unknown before an estimator is produced, however, one can set $\rho =
  c_{\rho}\sqrt{\frac{n_{\max}}{L}}$ and the upper bound will only change by a
factor $\max\{1,\kappa\}$ in the first term of \Cref{eq:thm1_optimal}.
\enrmk

\subsection{Comparison to other work}\label{sec:comparison-to-other-work}

To the best of our knowledge, \cite{yan2012sparsecompbtl,
  hajek2014minimaxinferencepartialrank,shah2015estimationfrompairwisecomps,negahban2017rankcentralitypairwisecomparisons,
  agarwal2018acceleratedspectralranking,
  hendrickx2019graphresistance,hendrickx2020minimaxpairwisebtl} are the only
existing papers that study estimation error for the \btl{} model on a
comparison graph with general topology. Since
\cite{negahban2017rankcentralitypairwisecomparisons,agarwal2018acceleratedspectralranking,hendrickx2019graphresistance,hendrickx2020minimaxpairwisebtl}
estimate the the preference scores $\bfw^{*}$ rather than $\bbrtheta^*$, we
cannot directly compare our results with theirs because there is no tight
two-sided relationship between their metrics of error and ours. Therefore, here
we only compare our results to those in \cite{yan2012sparsecompbtl,
  hajek2014minimaxinferencepartialrank, shah2015estimationfrompairwisecomps}, as
is summarized in \Cref{tab:compare}.
% noting the loss metric used in each.
We include the comparison to the other four papers in
\Cref{sec:comparison_detail}.

\textbf{$\ell_{\infty}$ loss: }\cite{yan2012sparsecompbtl} establish an
$\ell_{\infty}$-bound depending on $n_{ij}$, the number of common neighbors of
item $i$ and item $j$ in the comparison graph, under a strong assumption that
$n_{ij}\geq cn$ for some constant $c\in (0,1)$. This constraint on graph
topology is stronger than ours since it requires the graph to be dense. In
particular, when the comparison graph comes from an Erd\"os-R\'enyi model
$ER(n,p)$, $\min_{i,j}n_{ij}\asymp np^2$. Then the conditions in
\cite{yan2012sparsecompbtl} requires $p$ to be bounded away from $0$ and their
bound becomes $\frac{e^\kappa}{p}\sqrt{\frac{\log n}{npL}}$, while our bound is
$\frac{e^{2\kappa_E}}{\sqrt{p}}\sqrt{\frac{\log n}{npL}}$. Our bound is tighter
for moderate or small $\kappa_E$, and importantly, allows $p$ to vanish.
Furthermore, in \Cref{sec:simulations}, we show by some specific examples that
$\min_{i,j}n_{ij}$ could be 0 even for many fairly dense graphs, to illustrate
that the \cite{yan2012sparsecompbtl} upper bound cannot apply to many realistic
settings.

\textbf{$\ell_{2}$ loss: }\cite{hajek2014minimaxinferencepartialrank,
  shah2015estimationfrompairwisecomps} consider constrained MLE
$\hat{\bbrtheta}:=\min_{\|\bbrtheta\|_{\infty}\leq B}\ell_0(\bbrtheta)$ for a
known parameter $B$ such that $\| \bbrtheta^*\|_\infty \leq B$. Setting aside
the fact that their results require stricter conditions than ours, our
$\ell_{2}$ bound is tighter than theirs for general parameter settings with
moderate $B,\kappa$ and for a broad range of graphs with moderate
$\lambda_{2}(\mclL_{\bfA})$, i.e., not too sparse or irregular.

\begin{table}[htb!]
  \centering
  \begin{tabular}{c|c|c}
    \hline
    \textbf{Norm}                               & \textbf{Reference}          &
    \textbf{Upper bound}                                                            \\ \hline\hline
    $\|\cdot\|_\infty$                          & \cite{yan2012sparsecompbtl} &
    $\frac{e^{\kappa}}{\min_{i,j}n_{ij}}\sqrt{\frac{n_{\max}\log n}{L}}$            \\
    \cline{2-3}                                 & \textbf{Our work}           & See
    \Cref{nthm:thm1}                                                                \\
    \hline
                                                &
    \cite{hajek2014minimaxinferencepartialrank} & $e^{8B}\frac{|E|\log
    n}{\lambda_2(\mathcal{L}_A)^2L}$                                                \\ \cline{2-3} $\|\cdot\|^2_2$     &
    \cite{shah2015estimationfrompairwisecomps}  & $e^{8B}\frac{n\log
    n}{\lambda_2({\mclL}_A)L}$                                                      \\ \cline{2-3} &  \textbf{Our work}     &
    $\frac{e^{2\kappa_E}}{\lambda_2(\mclL_\bfA)^2} \frac{n_{\max}n}{L}$             \\
    \hline
  \end{tabular}
  \caption{Comparison of results in literature.}
  \label{tab:compare}
\end{table}

We re-emphasize that \cite{hendrickx2020minimaxpairwisebtl} also provide upper
bounds for a general fixed comparison graph that matches an instance-wise lower
bound, for their parameter of interest $\mathbf{w}^* \defined
  (e^{\theta_1^*},\ldots,e^{\theta_n^*})^{\top}$, instead of $\bbrtheta^*$.
However, their error metric, \ie, $\sin(\hat{\bfw}, \bfw^*)$, is quite different
from other similar papers in the \btl{} literature, including our work. As such,
it is not clear how to compare to their results. Furthermore, as noted in
\Cref{sec:introduction}, from the perspective of ranking, an entry-wise metric
like $\|\cdot\|_{\infty}$ is more informative than vector-level metrics like
$\|\cdot\|_2$ and $\sin(\cdot, \cdot)$.

\subsection{Special cases of  graph topologies}\label{sec:special-case} We can
check some common types of comparison graph topologies and see in what order the
necessary sample complexity $N_{\textnormal{comp}} =|E|L$ needs to be to achieve
consistency, i.e., $\|\hat{\bbrtheta} - \bbrtheta^*\|_{\infty} = o(1)$. The
results are summarized in \Cref{tab:special_cases}. For path and star graphs, we
used the specialized bounds in \Cref{prop:path,prop:tree}.
\begin{table}[!h]
  \centering
  \begin{tabular}{c|c|c}
    \hline
    \textbf{Graph}                &
    \specialcell{$\mathbf{N_{\textnormal{comp}}}$                                \\
    \citep{yan2012sparsecompbtl}} &
    \specialcell{$\mathbf{N_{\textnormal{comp}}}$                                \\
    \textbf{(Our work)}}                                                         \\
    \hline \hline
    \textbf{Complete}             & $\Omega(n^2)$ & $\Omega(n^2)$                \\
    \textbf{Bipartite}            & \texttt{N/A}  & $\Omega(n^2)$                \\
    \textbf{Path}                 & \texttt{N/A}  & $\omega(e^{2\kappa_E}n^2\log
    n)$                                                                          \\
    \textbf{Star}                 & \texttt{N/A}  & $\omega(e^{2\kappa_E}n \log
    n)$                                                                          \\
    \textbf{Barbell}              & \texttt{N/A}  & $\omega(e^{2\kappa_E}n^5\log
    n)$                                                                          \\
    \hline
  \end{tabular}
  \caption{Magnitude of $N_{\textnormal{comp}}$ to ensure $\|\hat{\bbrtheta} -
    \bbrtheta^*\|_{\infty} = o(1)$.}
  \label{tab:special_cases}
\end{table}
As shown in \Cref{tab:special_cases}, our bound now applies to a much broader
class of graph topologies under the $\ell_{\infty}$-norm compared to
\citet{yan2012sparsecompbtl}.

\bnrmk \label{rmk:sample-complexity} For the path graph, star graph, and barbell
graph, the necessary sample complexity induced by directly applying our
$\ell_{\infty}$ bound is larger than the sample complexity induced by the
$\ell_2$ bound in \cite{shah2015estimationfrompairwisecomps}, though they
require more stringent conditions than ours. Thus we provide specialized sharp
upper bounds in the case of path and star graph in \Cref{prop:path} and
\ref{prop:tree}. Additionally, in \Cref{sec:implications-of-work}, we illustrate
that by applying a \textit{unique} sub-additivity property of
$\ell_{\infty}$-loss, we can achieve a much smaller sample complexity in graphs
with bottlenecks like the barbell graph.
\enrmk

\textbf{Erd\"os-R\'enyi graph: }By applying a union bound on
$\lambda_2(\mclL_{\bfA})$, $n_{\max}$, and $n_{\min}$ to the sample-wise bounds
in \Cref{nthm:thm1}, we obtain a corollary in the setting where the comparison
graph follows the Erd\"os-R\'enyi model $ER(n,p)$.

\bncor[Erd\"os-R\'enyi graph]\label{cor:cor_ER} As a corollary to
\Cref{nthm:thm1}, suppose that the comparison graph comes from an
Erd\"os-R\'enyi graph $ER(n,p)$, then under the same conditions, with
probability at least $1 - O(n^{-4})$, it holds that
\begin{equation*}
  \|\hat{\bbrtheta}_{\rho} - \bbrtheta^*\|_{\infty} \lesssim e^{2\kappa_E} \sqrt{\frac{\log n}{np^2L}},
  \|\hat{\bbrtheta}_{\rho} - \bbrtheta^*\|_2 \lesssim  {e^{\kappa_E}} \sqrt{\frac{1}{pL}}.
\end{equation*}
\encor
%\bnrmk \label{nrmk:compare-erdos-renyi}
The full form of \Cref{cor:cor_ER} with a proof can be found at the end of
\Cref{sec:prf_upper_bounds}. For the Erd\"os-R\'enyi comparison graph $ER(n,p)$,
the tightest $\ell_{\infty}$-norm error bound $e^{2\kappa}\sqrt{\frac{\log
      n}{npL}}$ is proved in \cite{chen2019spectralregmletopk} and
\cite{chen2020partialtopkranking}. \cite{han2020asymptoticsparsebradleyterry}
establish an $\ell_{\infty}$-norm upper bound of $e^{2\kappa}\sqrt{\frac{\log
      n}{np}}\cdot \frac{\log n}{\log (np)}$.
\cite{negahban2017rankcentralitypairwisecomparisons} obtain an $\ell_{2}$-norm
upper bound of $e^{4\kappa}\frac{\log n}{pL}$ and a lower bound of $e^{-\kappa}
  \frac{1}{pL}$. Thus the derived $\ell_{2}$-bound in \Cref{cor:cor_ER} in
Erd\"os-R\'enyi case is minimax optimal.

In this case our derived $\ell_{\infty}$-bound cannot achieve the rate
established in \cite{chen2019spectralregmletopk},
\cite{chen2020partialtopkranking}, though our $\ell_{2}$-bound exhibits the
optimal rate proved in \cite{negahban2017rankcentralitypairwisecomparisons}. The
reason why our bound does not imply the optimal  $\ell_{\infty}$-rate under a
Erd\"os-R\'enyi comparison graph is that our bound is a sample-wise bound and
thus cannot leverage some regular property of Erd\"os-R\'enyi graph beyond
algebraic connectivity and degree homogeneity that is exhibited with high
probability.
%\enrmk

\textbf{Tree graphs:} For extremely sparse graphs like tree graphs, the general
upper bound in \Cref{nthm:thm1} is loose compared to the lower bound in
\Cref{nthm:thm2-lb}. Therefore, we separately prove some sharp upper bounds for
path and star graphs as a complement to our general theory, in these frequently
studied cases. For example, single-elimination sports tournaments are commonly
designed as a binary tree graph. By the spectral property of path and star
graphs (see \Cref{sec:appendix-special-case}), one can verify that the
upper bounds in both norms match the $\ell_{\infty}$ lower bound in
\Cref{nthm:thm2-lb} and the $\ell_{2}$ lower bound in
\cite{shah2015estimationfrompairwisecomps}, up to $\sqrt{\log n}$ and
$e^{2\kappa_E}$ factors.

\bnprop[Path graph]
\label{prop:path}
Suppose the comparison graph is a path graph $([n],E)$ with
$E = \{(i, i+1)\}_{i\in [n - 1]}$ and $L>c e^{2\kappa_E}n\log n$ for some
universal constant $c$, then with probability at least $1 - n^{-4}$, the vanilla
MLE $\hat{\bbrtheta}_0$ satisfies
\begin{equation*}
  \begin{split}
      \|\hat{\bbrtheta}_0 - \bbrtheta^*\|_{\infty}&\lesssim e^{\kappa_E}\sqrt{\frac{n\log n}{L}}, \\
      \|\hat{\bbrtheta}_0 - \bbrtheta^*\|_{2} &\lesssim e^{\kappa_E}n\sqrt{\frac{\log n}{L}}.
  \end{split}
\end{equation*}
\enprop
\bnprop[General tree graph] 
\label{prop:tree}
Suppose the graph is a tree graph $([n],E)$ where
each item $i$ and $j$ are compared $L$ times such that $L>c e^{2\kappa_E}n\log
  n$ for some universal constant $c$. Then with probability at least $1 - n^{-4}$,
the vanilla MLE $\hat{\bbrtheta}_0$ satisfies
\begin{equation*}
  \begin{split}
      \|\hat{\bbrtheta}_0 - \bbrtheta^*\|_{\infty}&\lesssim e^{\kappa_E}\sqrt{\frac{D\log n}{L}},\\
      \|\hat{\bbrtheta}_0 - \bbrtheta^*\|_{2}&\lesssim e^{\kappa_E}\sqrt{\frac{Dn\log n}{L}},
  \end{split}
\end{equation*}
where $D:=\max_{i,j}|{\rm path}(i,j)|$ is the diameter. In particular, for star
graph, the upper bound is given by $D = 1$.
\enprop
The full form of \Cref{prop:path} and \Cref{prop:tree} with proofs are found in
\Cref{sec:prf_upper_bounds}. Briefly, the proofs leverage the closed-form
solution of vanilla MLE under the tree graph.

\section{Lower bounds}\label{sec:lower-bounds} In this section, we derive a
minimax lower bound for the $\ell_{\infty}$ loss. Towards that end, we first
introduce some new notation. Let $N_{\textnormal{comp}}$ be the total number of
comparisons that have been observed, so in our setting, $N_{\textnormal{comp}} =
  |E|L$ where $|E|$ is number of edges in the comparison graph $\mclG$. Denote the
two items involved in the $i$-th comparison as $(i_1,i_2)$ such that $i_1<i_2$.
Let $\tilde{\mclL}_A =
  \frac{1}{N_{\textnormal{comp}}}\sum_{i=1}^{N_{\textnormal{comp}}}(\bfe_{i_1} -
  \bfe_{i_2})(\bfe_{i_1} - \bfe_{i_2})^\top$ be the normalized graph Laplacian
with pseudo inverse $\tilde{\mclL}_A^{\dagger}$ and eigenvalues
$0=\lambda_1(\tilde{\mclL}_A) \leq \lambda_2(\tilde{\mclL}_A)\leq\cdots\leq
  \lambda_n(\tilde{\mclL}_A)$. With the main notation in place, our minimax lower
bound is summarized in the following result.

\bnthm\label{nthm:thm2-lb} Assume that the comparison graph $\mclG$ is connected
and the sample size $N_{\textnormal{comp}} \geq \frac{c_{2}
    \operatorname{tr}\left(\tilde{\mclL}_A^{\dagger}\right)}{e^{2\kappa}
    \kappa^{2}}$, any estimator $\widetilde{\bbrtheta}$ based on
$N_{\textnormal{comp}}$ comparisons with outcomes from the \btl{} model
satisfies
\begin{align*}
  \sup_{\bbrtheta^*\in \Theta_\kappa} \mathbb{E} & \left[\|\widetilde{\bbrtheta} - \bbrtheta^{*}\|^2_{\infty}\right]
  \gtrsim \frac{e^{-2\kappa}}{n N_{\textnormal{comp}} } \enspace \times                                              \\
                                                 & \max \Big\{{n}^2,
  \max_{n^{\prime} \in\{2, \ldots, n\}}  \sum_{i=\left\lceil 0.99 n^{\prime}\right\rceil}^{n^{\prime}} [\lambda_{i}(\tilde{\mclL}_A)]^{-1} \Big\}
\end{align*}
where $\Theta_{\kappa} = \{\theta\in \mathbb{R}^n:\mathbf{1}_n^\top \bbrtheta =
  0,\ \|\bbrtheta\|_{\infty}\leq \kappa\}$.
\enthm

The proof of \Cref{nthm:thm2-lb} largely leverages the lower bound construction
from Theorem 2 in \cite{shah2015estimationfrompairwisecomps}. The main
modification in adapting it to our setting is to construct an
$\ell_{\infty}$-packing set. This is done by utilizing the \textit{tight}
topological equivalence of $\ell_{\infty}$ and $\ell_{2}$ norms in finite
dimensions.

% \paragraph{Remark} \bnrmk\label{nrmk:upper-lower-bounds-comparison}
We can compare this lower bound with the upper bound in \Cref{nthm:thm1}. In our
setting, the comparisons distribute evenly over all pairs, so
$N_{\textnormal{comp}} = |E|L$, and $\lambda_i(\tilde{\mclL}_A) = \frac{1}{|E|}
  \lambda_i(\mathcal{L}_A)$. Thus, given a comparison graph with
$\lambda_2(\tilde{\mclL}_A) \asymp \frac{1}{n}$, the lower bound becomes
\[
  \sup _{\bbrtheta^*\in \Theta_{\kappa}} \mathbb{E}\left[\|\widetilde{\bbrtheta}-\bbrtheta^{*}\|_{\infty}\right]
  \gtrsim e^{-\kappa} \sqrt{ \frac{n}{N_{\textnormal{comp}}} }
\]
In $ER(n,p)$ case, this lower bound becomes $e^{-\kappa}\sqrt{\frac{1}{npL}}$
which matches the upper bound in \cite{chen2019spectralregmletopk}. For some
``regular'' graph topology with $\lambda_2(\tilde{\mclL}_A) \asymp \frac{1}{n}$
like complete graph, expander graph with $\phi=\Omega(n)$ and complete bipartite
graph with two partition sets of size $\Omega(n)$, the upper bound becomes
% \[
% \|\hat{\bbrtheta}_{\rho} - \bbrtheta^*\|_{\infty} \lesssim e^{2\kappa }
% \sqrt{\frac{n}{N_{\textnormal{comp}}}} + e^{\kappa} \sqrt{\frac{n\log
% n}{N_{\rm comp}}}\lesssim e^{2\kappa} \sqrt{\frac{n\log
% n}{N_{\textnormal{comp}}}}.
% \]
\begin{equation*}
  \|\hat{\bbrtheta}_{\rho} - \bbrtheta^*\|_{\infty} \lesssim e^{2\kappa} \sqrt{\frac{n\log n}{N_{\textnormal{comp}}}}.
\end{equation*}
Therefore, when the comparison graph topology is sufficiently regular, our upper
bound matches the lower bound up to a $\log n$ factor and a factor of
$e^{3\kappa}$. As a final remark,
\cite{negahban2017rankcentralitypairwisecomparisons} show that the minimax lower
bound for $\ell_{2}$-loss and Erd\"os-R\'enyi comparison graph $ER(n,p)$  is
$e^{-\kappa} \frac{1}{pL}$, which matches  our $\ell_{2}$ upper bound up to a
factor of $e^{2\kappa}$.
%\enrmk

\section{Implications for tournament design}\label{sec:implications-of-work}

In this section, we discuss how our results can be leveraged to construct more
efficient tournament design from a ranking perspective in sports leagues.

As discussed in \Cref{sec:special-case}, for some comparison graphs with small
$\lambda_2(\mclL_{\bfA})$, the requirement on $L$ and $N_{\textnormal{comp}}$
for consistency is stringent. However, as we show next, we can significantly
relax the requirement on the sample complexity $N_{\textnormal{comp}}$ by
adaptively varying the number pairwise comparisons observed over different
subsets of the items in a manner that leverages different degrees of
connectivity of the comparison graphs.

The basic idea is that model parameters corresponding to a subset of items
inducing a highly connected sub-graph require relatively few observations. On
the other hand,  the outcomes of comparisons with items corresponding to nodes
of the comparison graph that are part of a ``graph bottleneck'' are especially
important in yielding accurate global ranking and, therefore, should be more
heavily sampled (in the sense of having a larger number $L$ of observations).
The case of a Barbell graph consisting of two complete sub-graphs connected by
few ``bridge'' edges (as is shown in \Cref{fig:barbell-bridge}) is an extreme
illustration of this situation and will be discussed below. In this case, it is
clear that the parameters corresponding the items adjacent to the bridge edges
ought to be estimated with higher accuracy and therefore, for those items $L$
should be set larger. Furthermore, it is possible to estimate the model
parameters separately over different sub-graphs and combine these estimators in
a way that could lead to an improved rate, compared to a joint or omnibus
estimator. Indeed, the next result shows that the $\ell_{\infty}$-error rate of
the combined estimator is bounded by the sum of the error rates for estimating
the parameters of the individual sub-graphs.

Formally, let $I_1,I_2,I_3$ be three subsets of $[n]$ such that $\cup_{j=1}^3
  I_j = [n]$ and, for each $j \neq k$, $I_j  \not\subseteq I_k$ and for $i=1,2$,
$I_i \cap I_3 \neq \emptyset$. Assume that the sub-graphs induced by $I_j$'s
are connected and the number of comparisons for all pairs can be different
across sub-graphs. Let $\bbrtheta^*$ be the vector of preference scores in the
\btl{} model over $n$ items and $\hat{\bbrtheta}_{(j)}$ be the MLE of
$\bbrtheta^*_{(j)}\in \mathbb{R}^{|I_j|}$ for the \btl{} model involving only
items in $I_j$, $j=1,2,3$. Also define the augmented version
$\tilde{\bbrtheta}_{(j)}\in \mathbb{R}^n$ such that
$\tilde{\bbrtheta}_{(j)}(I_j) = \hat{\bbrtheta}_{(j)}$.

Now take two nodes $t_1\in I_1\cap I_3$, $t_2\in I_2\cap I_3$, and let $\delta_3
  = \tilde{\bbrtheta}_{(1)}(t_1) - \tilde{\bbrtheta}_{(3)}(t_1)$, $\delta_2 =
  \tilde{\bbrtheta}_{(3)}(t_2) - \tilde{\bbrtheta}_{(2)}(t_2)$. An ensemble
estimator \textit{add-MLE} $\hat{\bbrtheta}\in \mathbb{R}^n$ is a vector such
that $\hat{\bbrtheta}(I_1) = \hat{\bbrtheta}_{(1)}$, $\hat{\bbrtheta}(S_2) =
  \hat{\bbrtheta}_{(2)}(S_2) + \delta_3 + \delta_2$, and $\hat{\bbrtheta}(S_3) =
  \tilde{\bbrtheta}_{(3)}(S_3) + \delta_3$, where $S_2 = I_2\setminus I_1$ and
$S_3 = I_3\setminus (I_1\cup I_2)$. Notice that the value of $\hat{\bbrtheta}$
depends on the choice of $t_1,t_2$, but the estimation error of all ensemble
estimators can be well-bounded, as is shown in
\Cref{nlem:subbadditivity-ellinf-norm}. \bnprop[Subadditivity of
  $\ell_{\infty}$-loss in \btl{}]\label{nlem:subbadditivity-ellinf-norm} Under
the setting above, for any add-MLE $\hat{\bbrtheta}\in\mathbb{R}^n$ based on
$\hat{\bbrtheta}_{(1)}, \hat{\bbrtheta}_{(2)}, \hat{\bbrtheta}_{(3)}$, it
holds that
\begin{equation}
  d_{\infty}(\hat{\bbrtheta},\bbrtheta^*) \leq 4\sum_{i=1}^3 d_{\infty}(\hat{\bbrtheta}_{(i)},\bbrtheta^*_{(i)}),
\end{equation}
where $d_{\infty}(\bfv_1,\bfv_2) \defined \|(\bfv_1 - {\rm
  avg}(\bfv_1)\mathbf{1})-(\bfv_2 - {\rm avg}(\bfv_2)\mathbf{1})\|_{\infty}$ and
${\rm avg}(\bfx):=\frac{1}{n}\mathbf{1}_n^\top \bfx$ for $\bfx\in
  \mathbb{R}^n$.
\enprop
The proof of \Cref{nlem:subbadditivity-ellinf-norm} is found in
\Cref{sec:others}. For some types of graph topologies the above result can be
used to devise a {\it divide-and-conquer strategy} for estimating the model
parameters with better sample complexity than that of an omnibus estimator,
i.e., the joint-MLE in our setting. Indeed, as discussed in
\Cref{sec:special-case}, for a barbell graph containing two size-$n/2$ complete
sub-graphs connected by a single edge, we need $N_{\textnormal{comp}} =
  \Omega(n^5\log n)$ for an $o(1)$ error bound of the joint-MLE. From a practical
perspective, we note that such a divide and conquer strategy gives flexibility
in the number of comparisons in each sub-graph. For example, if we set $L=1$ for
the two complete sub-graphs to get MLEs $\hat{\bbrtheta}_1$,
$\hat{\bbrtheta}_2$, and set $L = n$ for the two items linking the two
sub-graphs to get an MLE $\hat{\bbrtheta}_3$, and combine them by shifting
$\hat{\bbrtheta}_2$ by the difference of two entries of $\hat{\bbrtheta}_3$,
then a total sample complexity \textit{reduction} to $N_{\rm comp}=\Omega(n^2)$
will ensure $\ell_{\infty}$-norm error of order $O(e^{2\kappa_E}\sqrt{\log
    n}/\sqrt{n})=o(1)$, because for a complete graph of size $m$, the
$\ell_{\infty}$-norm error is $O(e^{2\kappa_E}\sqrt{\log m}/\sqrt{mL})$. In
\Cref{exa:n_ij=0} and \Cref{subsec:additivity}, we show some simulation results
illustrating the advantage of using subadditivity in estimation, where we
generalize the add-MLE to Island graph and Barbell graph with multiple bridge
edges that can have more than $3$ dense sub-graphs.

Note that such flexible tournament design is similar to the idea of
\textit{active ranking} \citep{heckel2019activerankingpairwisecomps,
  ren2019activeranking}, but there is still a substantial difference between our
setting and active ranking. Active ranking assumes that one can design the
tournament in an \textit{online} manner, so that the next pair of items to be
compared is determined by the newest outcomes of comparisons. However, in
practice many tournaments can only be designed \textit{offline}, \ie, before any
games are played. Under this common setting, our $\ell_{\infty}$-subadditivity
property provides a useful offline approach to efficient tournament design.


\section{Examples and simulations}\label{sec:simulations} In this section, we
conduct numerical experiments on simulated data with two main goals. First, we
illustrate the utility of the subadditivity property in
\Cref{nlem:subbadditivity-ellinf-norm} in the case of Island graphs (see
\Cref{exa:n_ij=0}). Second, we demonstrate the relative tightness of our
$\ell_\infty$ upper bound compared to \cite{yan2012sparsecompbtl}, since their
work is closest in spirit to ours. Specifically, we compare the two bounds in a
setting where analytical comparison is not directly feasible (see
\Cref{exa:bridge}). All of our reproducible code is openly
accessible\footnote{Repo: \url{https://github.com/MountLee/btl_mle_l_inf}}.

In the \btl{} model, the maximal winning probability is $p_{\max}(\kappa) =
  1/({1 + e^{-\kappa}})$. To get a sense, $p_{\max}(2.20) = 0.900$,
$p_{\max}(4.59) = 0.990$. A winning probability larger than 0.99 is fairly
rare in practice, so it would not be too constraining to set $\kappa = 2.2$ in
our simulation. But analytically our result allows $\kappa$ to diverge with
$n$.

In our experiments, we set $\theta^*_i = \theta_1^* + (i-1)\delta$ for $i>1$
with $\delta = \kappa / (n-1)$. We additionally assign $\theta_1^*$ to ensure
that $\mathbf{1}_n^\top \bbrtheta^* = 0$, for parameter identifiability. Under
this setting, for some special graphs, e.g., the Island graph in
\cref{exa:n_ij=0}, $\kappa_E$ can be much smaller than $\kappa$, showing an
advantage of our upper bound in representing the dependency on the maximal
performance gap $\kappa_E$ along the edge set, rather than $\kappa$ the whole
vertex set. However, there may be some cases where the majority of edges have
small performance gaps and only a few edges have large gaps. Here, the control
in the upper bound purely by $\kappa_E$ can again be loose. An interesting
future direction is to make upper bounds tighter in such cases by including more
structural parameters, like the proportion of small-gap edges. We include some
illustrative examples
% \footnote{All of the simulation results in this paper were run on a personal
% laptop with Windows10 OS and Intel Core i7-8850H CPU. The total computation
% time for a single run is approximately 30 minutes.}
in \Cref{sec:additional-experiments}.

\begin{figure}[!h]
  \centering
  \includegraphics[width=0.4\textwidth]{adj_island.pdf}
  \includegraphics[width=0.4\textwidth]{island_additivity_diff_ni50_no5_k5_L10.pdf}
  \caption{Left: Adjacency matrix of a 3-Island graph, with yellow indicating 1
    and purple indicating 0; $\lambda_2(\mclL_{\bfA}) = 11.92$. Right: Adjacency
    matrix of a general Island graph, with $n_{\text{island}} = 30$,
    $n_{\text{overlap}} = 5$, $n = 120$; $\lambda_2(\mclL_{\bfA}) = 1.19$.
    Bottom: comparison of the error of the joint-MLE and the add-MLE. The curve
    is obtained as the average of 100 trials with one standard deviation shown
    by the colored area.}
  \label{fig:island}
\end{figure}

\bnexa[Graph with $\min_{i,j}n_{ij}=0$] \label{exa:n_ij=0} In this case, we
intend to illustrate  that $\min_{i,j}n_{ij}$ could be $0$ or quite close to $0$
for even fairly dense graphs, making the upper bound in
\cite{yan2012sparsecompbtl} less effective. Consider a \textit{3-Island}
comparison graph $\mclG$ with $n$ nodes. The induced sub-graphs on node sets
$V_1$, $V_2$, $V_3$ with $|V_i| = n_i$ are complete graphs, where $V_1\cap V_3 =
  \emptyset$, $V_1\cup V_2\cup V_3 = [n]$, and $V_i\cap V_2\neq \emptyset$ for $i
  = 1,3$. There is no edge except for those within $V_1,V_2,V_3$. This graph
$\mclG$ is connected, and can be fairly dense if we make $n_2$ large, but
$\min_{i,j}n_{ij}=0$ always holds since $V_1\cap V_3 = \emptyset$ and the two
induced sub-graphs are complete. See \Cref{fig:island} left panel for a
visualization of the adjacency matrix of such a graph.

We can also consider more general \textit{Island} graphs. A general Island graph
is determined by $n$, the size of the graph, $n_{\text{island}}$, the size of
island sub-graphs, and $n_{\text{overlap}}$, the number of overlapped nodes
between islands. Each island sub-graph is a complete graph, and there is no edge
outside islands. For Island graphs, it holds that $\min_{i,j}n_{ij}=0$ and
$\kappa_E\approx \kappa \cdot n_{\text{island}}/n$. \Cref{fig:island} top panel
shows the adjacency matrix of two Island graphs. \Cref{fig:island} bottom panel
shows the comparison of the $\ell_{\infty}$-error of the joint-MLE and the
add-MLE (see the detailed definition in \Cref{sec:additional-experiments}) while
varying the difference in the average of preference scores of each island
sub-graph, where we set $n_{\text{island}} = 50, n_{\text{overlap}} = 5,L = 10$. Every
point on the lines is the average of 100 trials. It can be seen that the add-MLE
by the divide-and-conquer strategy largely dominates the joint-MLE in
$\ell_{\infty}$-error.
\enexa
In \Cref{exa:n_ij=0}, we show a common family of graphs which is fairly dense
while $\min_{i,j}n_{ij}=0$, so that the upper bound in
\cite{yan2012sparsecompbtl} does not hold. Next in \Cref{exa:bridge} we consider
another family of graphs where their upper bound holds but still relatively
looser than our bound.

\bnexa[Barbell graph with random \textit{bridge} edges] \label{exa:bridge}
Consider a generalized Barbell graph $\mclG$ containing $n = n_1 + n_2$ nodes,
where the induced sub-graph on nodes $\{1,\cdots,n_1\}$ and $\{n_1+1,\cdots,n\}$
are complete graphs, and the two sub-graphs are connected by some bridge edges
$(i,j)$ for some $1\leq i\leq n_1$  and $n_1 + 1\leq j\leq n$. Denote the set of
bridge edges as $E_l$, then $|E_l|/(n_1 n_2)$ quantifies the connectivity of
$\mclG$: the larger $|E_l|/(n_1 n_2)$ is, the denser or more regular $\mclG$ is.

\begin{figure}[htb!]
  \centering
  \includegraphics[width=0.46\textwidth]{barbell_graph_32_10_paper.pdf}
  \includegraphics[width=0.44\textwidth]{barbell_ratio.pdf}
  \caption{Top: visualization of a Barbell graph with random bridge edges.
    Bottom: The ratio of our bound and the bound in \cite{yan2012sparsecompbtl}
    under the Barbell graph with random bridge edges and sub-graph size $n_1 =
      n_2=n_s$ varying. The curve is obtained as the average of 100 trials with
    one standard deviation shown by the colored area.}
  \label{fig:barbell-bridge}
\end{figure}

In \Cref{fig:barbell-bridge} we show a comparison of the real
$\ell_{\infty}$-loss $\|\hat{\bbrtheta} - \bbrtheta^*\|_\infty$, and the upper
bounds of $\ell_{\infty}$-error in \cite{yan2012sparsecompbtl} and our paper. We
include this relative comparison to numerically demonstrate that our bound is in
general tighter than \citet{yan2012sparsecompbtl}, since there is no
\textit{known} analytical relationship between $\min_{i,j}n_{ij}$ and
$\lambda_2(\mathcal{L}_A)$ for general graphs. In our experiment, we set $n_1 =
  n_2 = n_s$, $L =10$, and randomly link $|E_l|=n_1 n_2 p$ edges between the two complete
sub-graphs, and vary $n_s$ from $50$ to $1000$ with $p = 3\log(n_s)/n_s$. Every
point on the line is the average of 100 trials. It can be seen that our upper
bound has a faster vanishing rate, compared to \cite{yan2012sparsecompbtl} for
this simulated scenario. This is evident as the plotted ratio of our upper bound
relative to the upper bound in \citet{yan2012sparsecompbtl} has a steady
decreasing trend, as $n$ increases. It should be noted that there are leading
constant factors in both upper bounds, and for convenience we set them to be 1
for both bounds. Thus, one should focus on the trend of the curve rather than
the magnitude of the ratio in \Cref{fig:barbell-bridge}.
\enexa



\section{Discussion}\label{sec:discussion}

In this work we provide a sharp risk analysis of the MLE for the \btl{} global
ranking model, under a more general graph topology, in the $\ell_{\infty}$-loss.
This addresses a major gap in the \btl{} literature, in extending the comparison
graph to more general and thus more practical settings, compared to dense graph
setting in \citet{yan2012sparsecompbtl}. Specifically we derive a novel upper
bound for the  $\ell_{\infty}$ and $\ell_2$-loss of the \btl{}-MLE, showing
explicit dependence on the algebraic connectivity of the graph, the sample
complexity, and the maximal performance gap between compared items. We also
derive lower bounds for the $\ell_{\infty}$-loss and analyze specific topologies
under which the MLE is nearly minimax optimal. We also show that the
$\ell_{\infty}$-loss satisfies a unique subadditivity property for the \btl{}
MLE and utilize our derived bounds for efficient tournament design. We note that
our upper bound is suboptimal in the cases where the graph topology is extremely
sparse or irregular. Although we provide sharp upper bounds for path and star
graphs as separate propositions, we still miss optimality other graph
topologies. A good future direction would be to optimize the upper and lower
bounds in such comparison graph regimes. Another promising direction is to
extend this analysis to the multi-user ranking models as in
\citet{jin2020rankaggviahetthurstmod}.

\noindent{\bf Acknowledgments}\label{subsec:acknowledgments}

We would like to thank Heejong Bong from the Carnegie Mellon University (CMU)
Department of Statistics \& Data Science, for his valuable feedback and
discussions during this work. We would also like to thank the anonymous
reviewers for their feedback which greatly helped improve our exposition.

\bibliography{refs}

\end{document}
