Finite-Sample Valid Rank Confidence Sets for a Broad Class of Statistical and Machine Learning Models
Abstract: Ranking populations such as institutions based on certain characteristics is often of interest,
and these ranks are typically estimated using samples drawn from the populations. Due to
sample randomness, it is important to quantify the uncertainty associated with the estimated
ranks. This becomes crucial when latent characteristics are poorly separated and where many
rank estimates may be incorrectly ordered. Understanding uncertainty can help quantify and
mitigate these issues and provide a fuller picture. However, this task is especially challenging
because the rank parameters are discrete and the central limit theorem does not apply to the
rank estimates. In this article, we propose a Repro Samples Method to address this nontrivial
inference problem by developing a confidence set for the true, unobserved population ranks.
This method provides finite-sample coverage guarantees and is broadly applicable to ranking
problems. The effectiveness of the method is illustrated and compared with several published
large sample ranking approaches using simulation studies and real data examples involving
samples both from traditional statistical models and modern data science algorithms.
Loading