['6,7c6,7', '< Learning-based predictive algorithms are widely used in real-world systems and have significantly impacted our daily lives. However, many algorithms are deployed without sufficient testing or a thorough understanding of likely failure modes. This is especially worrisome in high-stakes application areas such as healthcare, finance, and autonomous transportation. In order to address this critical challenge and provide tools for rigorous system evaluation prior to deployment, there has been a rise in techniques offering explicit and finite-sample statistical guarantees that hold for any unknown data distribution and black-box algorithm, a paradigm known as distribution-free uncertainty quantification (DFUQ). In [1], a framework is proposed for selecting a model based on bounds on expected loss produced using validation data. Subsequent work [30] goes beyond expected loss to provide distribution-free control for a class of risk measures known as quantile-based risk measures (QBRMs) [8]. This includes (in addition to expected loss): median, value-at-risk (VaR), and conditional value-at-risk (CVaR) [23]. For example, such a framework can be used to get bounds on the 80th percentile loss or the average loss of the 10% worst cases.', "< While this is important progress towards the sort of robust system verification necessary to ensure the responsible use of machine learning algorithms, in some scenarios measuring the expected loss or value-at-risk is not enough. As models are increasingly deployed in areas with long-lasting societal consequences, we should also be concerned with the dispersion of error across the population, or the extent to which different members of a population experience unequal effects of decisions made based on a model's prediction. For example, a system for promoting content on a social platform may offer less appropriate recommendations for the long tail of niche users in service of a small set of users with high and typical engagement, as shown in [19]. This may be undesirable from both a business and societal point of view, and thus it is crucial to rigorously validate such properties in an algorithm prior to deployment and understand how the outcomes disperse. To this end, we offer a novel study providing rigorous distribution-free guarantees for a broad class of functionals including key measures of statistical dispersion in society. We consider both differences in performance that arise between different demographic groups as well as disparities that can be identified even if one does not have reliable demographic data or chooses not to collect them due to privacy or security concerns. Well-studied risk measures that fit into our framework include the Gini coefficient [33] and other functions of the Lorenz curve as well as differences in group measures such as the median [5]. See Figure 4 for a further illustration of loss dispersion.", '---', '> Learning-based predictive algorithms have become ubiquitous in real-world systems, profoundly impacting daily life. However, their widespread deployment often precedes sufficient testing and a comprehensive understanding of potential failure modes. This concern is particularly acute in high-stakes domains such as healthcare, finance, and autonomous transportation. To address this critical challenge and equip practitioners with robust evaluation tools prior to deployment, the field of distribution-free uncertainty quantification (DFUQ) has emerged. DFUQ provides explicit, finite-sample statistical guarantees that hold irrespective of the underlying data distribution or the specific black-box algorithm used. Early work, such as [1], established a framework for model selection based on bounds on expected loss derived from validation data. Subsequent advancements [30] extended this paradigm beyond expected loss to encompass a broader class of quantile-based risk measures (QBRMs) [8], including median, Value-at-Risk (VaR), and Conditional Value-at-Risk (CVaR) [23]. For instance, such frameworks enable the derivation of bounds for the 80th percentile loss or the average loss experienced by the 10% worst-performing cases.', '> While these developments represent significant progress toward robust system verification for responsible machine learning, focusing solely on expected loss or value-at-risk is often insufficient. As algorithmic models increasingly influence areas with long-lasting societal consequences, it becomes imperative to understand and control the dispersion of error across the population. This refers to the extent to which different individuals or groups experience unequal effects from algorithmic decisions. Consider, for example, a social platform\'s content promotion system that might inadvertently provide less relevant recommendations to a "long tail" of niche users, prioritizing a small set of highly engaged users, as observed in [19]. Such outcomes are undesirable from both business and ethical standpoints, necessitating rigorous validation of these distributional properties before deployment. To this end, we present a novel study offering rigorous distribution-free guarantees for a broad spectrum of statistical functionals, specifically targeting key measures of societal statistical dispersion. Our approach accounts for performance disparities arising between distinct demographic groups, as well as inequalities identifiable even without explicit demographic data, addressing privacy and security concerns. Well-established dispersion measures, such as the Gini coefficient [33] and other functions of the Lorenz curve, alongside differences in group-specific metrics like the median [5], are naturally accommodated within our framework. Figure 4 provides a further illustration of loss dispersion.', '9,10c9,10', '< In order to provide rigorous guarantees for socially important measures that go beyond expected loss or other QBRMs, we provide two-sided bounds for quantiles and nonlinear functionals of quantiles. Our framework is simple yet flexible and widely applicable to a rich class of nonlinear functionals of quantiles, including Gini coefficient, Atkinson index, and group-based measures of inequality, among many others. Beyond our method for controlling this richer class of functionals, we propose a novel numerical optimization method that significantly tightens the bounds when data is scarce, extending earlier techniques [21,30]. We conduct experiments on toxic comment moderation, detecting genetic mutations in cell images, and online content recommendation, to study the impact of our approach to model selection and tailored bounds.', '< To summarize our contributions, we: (1) initiate the study of distribution-free control of societal dispersion measures; (2) generalize the framework of [30] to provide bounds for nonlinear functionals of quantiles; (3) develop a novel optimization method that substantially tightens the bounds when data is scarce; (4) apply our framework to high-impact NLP, medical, and recommendation applications.', '---', '> To offer rigorous guarantees for these socially significant measures that extend beyond expected loss or other QBRMs, we introduce a framework for obtaining two-sided bounds for quantiles and, crucially, for nonlinear functionals of quantiles. Our framework is designed to be simple yet highly flexible and broadly applicable to a rich array of nonlinear quantile functionals, including the Gini coefficient, Atkinson index, and various group-based inequality measures, among others. Beyond enabling control over this richer class of functionals, we propose a novel numerical optimization method. This method significantly tightens the derived bounds, particularly in scenarios with limited data, thereby extending and enhancing earlier techniques [21,30]. We validate our approach through comprehensive experiments in toxic comment moderation, detection of genetic mutations in cell images, and online content recommendation, demonstrating its impact on robust model selection and the generation of tailored bounds.', '> To summarize our contributions, we: (1) initiate the systematic study of distribution-free control for societal dispersion measures; (2) generalize the framework of [30] to provide rigorous bounds for nonlinear functionals of quantiles; (3) develop a novel numerical optimization method that substantially tightens these bounds, especially under data scarcity; (4) apply our robust framework to high-impact applications in natural language processing (NLP), medical imaging, and recommendation systems.', '16c16', '< In this section, we motivate our method by studying some widely-used measures of societal statistical dispersion. There are key gaps between the existing techniques for bounding QBRMs and those needed to bound many important measures of statistical dispersion. We first define a QBRM: Definition 1 (Quantile-based Risk Measure). Let ψ(p) be a weighting function such that ψ(p) ≥ 0 and 1 0 ψ(p) dp = 1. The quantile-based risk measure defined by ψ is', '---', '> This section delineates the motivation for our methodology by examining widely-used measures of societal statistical dispersion. We highlight critical discrepancies between existing techniques for bounding Quantile-Based Risk Measures (QBRMs) and the broader requirements for bounding many important dispersion measures. We begin by formally defining a QBRM: Definition 1 (Quantile-based Risk Measure). Let ψ(p) be a weighting function such that ψ(p) ≥ 0 and ∫ 1 0 ψ(p) dp = 1. The quantile-based risk measure defined by ψ is', '18,19c18,19', '< A QBRM is a linear functional of F -, but quantifying many common group-based risk dispersion measures (e.g. Atkinson index) also involves forms like nonlinear functions of the (inverse) CDF or nonlinear functionals of the (inverse) CDF, and some (like maximum group differences) further involve nonlinear functions of functionals of the loss CDF. Thus a much richer framework for achieving bounds is needed here.', '< For clarity, we use J as a generic term to denote either the CDF F or its inverse F -depending on the context, and summarize the building blocks as below: (i) nonlinear functions of J, i.e. ξ(J); (ii) functionals in the form of integral of nonlinear functions of J, i.e. ψ(p)ξ(J(p))dp for a weight function ψ; (iii) composed functionals as nonlinear functions of functionals for the functional T (J) with forms in (ii), i.e. ζ(T (J)) for a non-linear function ζ.', '---', '> While a QBRM represents a linear functional of F -, many common group-based risk dispersion measures (e.g., the Atkinson index) necessitate bounding forms that include nonlinear functions of the (inverse) CDF, or nonlinear functionals of the (inverse) CDF. Furthermore, some measures (such as maximum group differences) involve nonlinear functions of functionals of the loss CDF. Consequently, a significantly richer and more flexible framework for achieving these bounds is required.', '> To clarify our approach, we use J as a generic term to denote either the CDF F or its inverse F -, depending on the specific context. We summarize the fundamental building blocks of these complex functionals as follows: (i) nonlinear functions of J, denoted as ξ(J); (ii) functionals expressed as integrals of nonlinear functions of J, specifically ∫ ψ(p)ξ(J(p))dp for a given weight function ψ; and (iii) composed functionals, which are nonlinear functions of functionals, such as ζ(T (J)) where T (J) takes a form described in (ii) and ζ is a nonlinear function.', '22,32c22,32', '< We start by introducing some classic non-group-based measures of dispersion. Those measures usually quantify wealth or consumption inequality within a social group (or a population) instead of quantifying differences among groups. Note that for all of these measures we only consider non-negative losses X, and assume that', '< 1 0 F -(p)dp > 0 1 .', '< Gini family of measures. Gini coefficient [33,34] is a canonical measure of statistical dispersion, used for quantifying the uneven distribution of resources or losses. It summarizes the Lorenz curve introduced in Figure 4. From the definition of Lorenz curve, the greater its curvature is, the greater inequality there exists; the Gini coefficient is measuring the ratio of the area that lies between the line of equality (the 45 • line) and the Lorenz curve to the total area under the line of equality. Definition 2 (Gini coefficient). For a non-negative random variable X, the Gini coefficient is', '< G(X) := E|X -X ′ | 2EX = 1 0 (2p -1)F -(p)dp 1 0 F -(p)dp', '< , where X ′ is an independent copy of X. G(X) ∈ [0, 1], with 0 indicating perfect equality.', '< Because of the existence of the denominator in the Gini coefficient calculation, unlike in QBRM we need both an upper and a lower bound for F -(see Section 4.1.1). In the appendix, we also introduce the extended Gini family.', '< Atkinson index. The Atkinson index [2,19] is another renowned dispersion measure defined on the non-negative random variable X (e.g., income, loss), and improves over the Gini coefficient in that it is useful in determining which end of the distribution contributes most to the observed inequality by choosing an appropriate inequality-aversion parameter ε ≥ 0. For instance, the Atkinson index becomes more sensitive to changes at the lower end of the income distribution as ε increases. Definition 3 (Atkinson index). For a non-negative random variable X, for any ε ≥ 0, the Atkinson index is defined as the following if ε ̸ = 1:', '< A(ε, X) := 1 - (E[X 1-ε ]) 1 1-ε E[X] = 1 - 1 0 (F -(p)) 1-ε dp 1 1-ε 1 0 F -(p)dp .', '< And for ε = 1, A(1, X) := lim ε→1 A(ε, X), which will converge to a form involving the geometric mean of X. A(ε, X) ∈ [0, 1], and 0 indicates perfect equality (see appendix for details).', "< The form of Atkinson index includes a nonlinear function of F -, i.e. (F -) 1-ε , but this type of nonlinearity is easy to tackle since the function is monotonic w.r.t. the range of F -(see Section 4.2.1). Remark 1. The reason we study the CDF of X and not X 1-ε is that it allows us to simultaneously control the Atkinson index for all ε's.", '< In addition, there are many other important measures of dispersion involving more complicated types of nonlinearity such as the quantile of extreme observations and mean of range. Those measures are widely used in forecasting weather events or food supply. We discuss and formulate these dispersion measures in the appendix.', '---', '> We begin by introducing several classic non-group-based measures of dispersion. These measures typically quantify inequality in quantities such as wealth or consumption within a single social group or an entire population, rather than differences between groups. It is important to note that for all these measures, we exclusively consider non-negative loss values X, and we assume that the expected value of the loss is positive:', '> ∫ 1 0 F -(p)dp > 0.', '> Gini family of measures. The Gini coefficient [33,34] stands as a canonical measure of statistical dispersion, widely employed to quantify the uneven distribution of resources or losses. It provides a concise summary of the Lorenz curve, which is visually represented in Figure 4. The degree of curvature in the Lorenz curve directly corresponds to the magnitude of inequality; specifically, the Gini coefficient measures the ratio of the area between the line of perfect equality (the 45-degree line) and the Lorenz curve, to the total area under the line of equality. Definition 2 (Gini coefficient). For a non-negative random variable X, the Gini coefficient is formally defined as:', '> G(X) := E|X -X ′ | 2EX = ∫ 1 0 (2p -1)F -(p)dp ∫ 1 0 F -(p)dp', '> , where X ′ denotes an independent copy of X. The Gini coefficient G(X) ranges from [0, 1], with a value of 0 signifying perfect equality.', "> Due to the presence of a denominator in the Gini coefficient's calculation, unlike QBRMs, bounding this measure necessitates both upper and lower bounds for F - (as detailed in Section 4.1.1). We further elaborate on the extended Gini family in the appendix.", '> Atkinson index. The Atkinson index [2,19] is another prominent dispersion measure, also defined for non-negative random variables X (e.g., income, loss). It offers an advantage over the Gini coefficient by allowing for the specification of an inequality-aversion parameter ε ≥ 0, which helps in determining which end of the distribution contributes most to the observed inequality. For example, as ε increases, the Atkinson index becomes more sensitive to changes at the lower end of the income distribution. Definition 3 (Atkinson index). For a non-negative random variable X, and for any ε ≥ 0, the Atkinson index is defined as follows when ε ≠ 1:', '> A(ε, X) := 1 - (E[X 1-ε ]) 1 1-ε E[X] = 1 - (∫ 1 0 (F -(p)) 1-ε dp) 1 1-ε ∫ 1 0 F -(p)dp .', '> For the special case where ε = 1, A(1, X) := lim ε→1 A(ε, X), which converges to a form involving the geometric mean of X. The Atkinson index A(ε, X) also ranges from [0, 1], with 0 indicating perfect equality (further details are provided in the appendix).', '> The mathematical form of the Atkinson index involves a nonlinear function of F -, specifically (F -) 1-ε . However, this type of nonlinearity is manageable because the function is monotonic with respect to the range of F - (see Section 4.2.1). Remark 1. Our decision to study the CDF of X rather than X 1-ε is motivated by its ability to enable simultaneous control of the Atkinson index for all possible values of ε.', '> Beyond these, there exist numerous other significant dispersion measures that incorporate more complex types of nonlinearity, such as the quantiles of extreme observations and the mean of the range. These measures are frequently utilized in critical applications like forecasting weather events or managing food supply. We discuss and formally define these additional dispersion measures in the appendix.', '657d656', '< ']
