

\section{Introduction}

The interest on the application of Machine Learning (ML) models on different industrial settings has increased in recent years, in particular given the success of deep neural networks and the availability of large amounts of data. In general, predictive ML models are optimized to capture the behaviour of a target variable based on a finite set of observations. One of the major concerns when deploying these models into real-world decision making processes is how to quantify the uncertainty in their prediction, especially in high-stakes domains such as health care or finance where there is a robust penalty for making mistakes.

Typical ML models produce point predictions (e.g., expected values for regression or most likely label in the case of classification), these are not a-priori informative on the range of values the target variable can take within normal operation (i.e. set of values expected to occur with high probability). Calibrated prediction sets (or interval) can be of great value for a decision maker that wants to consider worst-case scenarios. Moreover, understanding how the uncertainty of a model's prediction differs across varying subsets of the available data can inform the data collection process, model improvements or model selection/assessment.

\begin{figure}[h!]
\centering

\subfloat[Non-conformity region-based prediction framework]{
\includegraphics[width=0.9\columnwidth]{UAI2024/figures/regionbased_scp_scheme.png}
\label{fig:regionbased_scp_scheme}} 

\subfloat[Split Conformal Prediction]{
\includegraphics[width=0.45\columnwidth]{UAI2024/figures/SCP_regression.png}
\label{fig:SCP_regression}} 
\subfloat[Region-based SCP]{
\includegraphics[width=0.45\columnwidth]{UAI2024/figures/region_based_SCP_regression.png}
\label{fig:region_based_SCP_regression}} 

\begin{center}
% \caption{(\ref{fig:regionbased_scp_scheme}) Overview of proposed framework to produce prediction intervals/sets. We first decompose the input space \footnote{optionally on a representation space, $\phi(\mathcal{X})$} into interpretable groups \footnote{we use the terms regions, groups, partitions and clusters interchangeably} where each group contains homogeneous predictions of the $1-\alpha$-th quantile of the non-conformity scores of a ML model's prediction $f(\cdot)$. We then build a prediction interval/set, denoted $C_{\tau}(X) = C_{\alpha}(X,g_{\tau}(X))$ where $C_{\alpha}$ is the group-conditional conformal predictor, which depends on both the input $X$ and on the group prediction $g_{\tau}(X)$. $C_{\tau}$ satisfies group conditional coverage guarantees for the identified groups. (\ref{fig:SCP_regression}) regression example of heteroskedastic uncertainty in the model's prediction (blue line), the x-axis indicates the input variable, y-axis the target variable, and red dots the test samples. Prediction bands (blue) are produced by standard SCP with a coverage target of 0.95 ($\alpha = 0.05$), the desired coverage is achieved on average but there is significant disparity across regions of $X$. (\ref{fig:region_based_SCP_regression}) shows the prediction bands obtained with the proposed region-based approach in conjunction with SCP. Five groups where identified, and the group conditional coverage is improved significantly w.r.t. standard SCP. }
\caption{ (\ref{fig:regionbased_scp_scheme}) Overview of proposed framework to produce prediction intervals/sets. We first decompose the input space 
% \footnote{optionally on a representation space, $\phi(\mathcal{X})$} 
into interpretable groups 
% \footnote{we use the terms regions, groups, partitions and clusters interchangeably}
where each group contains homogeneous predictions of the $1-\alpha$-th quantile of the non-conformity scores of a ML model's prediction $f(\cdot)$. We then build a prediction interval/set, denoted $C_{\tau}(X) = C_{\alpha}(X,g_{\tau}(X))$ where $C_{\alpha}$ is the group-conditional conformal predictor, which depends on both the input $X$ and on the group prediction $g_{\tau}(X)$. $C_{\tau}$ satisfies group conditional coverage guarantees for the identified groups. (\ref{fig:SCP_regression}) regression example of heteroskedastic uncertainty in the model's prediction (blue line), the x-axis indicates the input variable, y-axis the target variable, and red dots the test samples. Prediction bands (blue) are produced by standard SCP with a coverage target of 0.95 ($\alpha = 0.05$), the desired coverage is achieved on average but there is significant disparity across regions of $X$. (\ref{fig:region_based_SCP_regression}) shows the prediction bands obtained with the proposed region-based approach in conjunction with SCP. Five groups where identified, and the group conditional coverage is improved significantly w.r.t. SCP. %standard SCP. 
}
\label{fig:intro}
\end{center}
\end{figure}
% \vspace{-.5in}
Conformal prediction methods \cite{vovk2005algorithmic} have gained significant popularity in recent years since they offer a distribution-free approach to quantify the uncertainty of a black box model's prediction with generalization guarantees \cite{shafer2008tutorial,angelopoulos2021gentle}. In particular, split conformal prediction (SCP) \cite{papadopoulos2002inductive} is an attractive post-hoc, model-agnostic approach that only requires access to the model's prediction and a calibration dataset. This is especially useful in settings where retraining or modifying an ML model to produce uncertainty estimates is infeasible, or when only query access to an ML model is possible (e.g., LLMs,\footnote{Large Language Models}).

Given a desired miscoverage level $\alpha$ (i.e. error-rate) conformal prediction methods produce prediction sets/intervals based on a black box model's prediction that are guaranteed to contain, on average, the ground truth value of the target variable with probability larger or equal than $1-\alpha$. They often rely on the quantile estimation of a non-conformity score, which is a measure of the disagreement between the target variable and the model prediction (e.g., absolute error), and only require that the calibration dataset be exchangeable\footnote{This is a weaker condition than full statistical independence} with the data samples the model will be tested on. Different works have studied how to adapt these methods to scenarios where the exchangeability assumption is violated, such as distribution shifts or time series settings \cite{gibbs2021adaptive,stankeviciute2021conformal,barber2023conformal}.

A significant amount of work has focused on understanding the feasibility of more efficient prediction sets and stronger-than-average coverage guarantees. An ideal goal would be to achieve input-conditional coverage (i.e., the coverage guarantees hold for each possible input), which has been proven to be impossible in practice \cite{vovk2012conditional,lei2014distribution}. Nonetheless, weaker guarantees such as local conditional coverage \cite{foygel2021limits} or group and level-set conditional coverage \cite{jung2022batch} are possible. Providing predictions sets with close to conditional coverage guarantees is valuable in settings where the model's prediction uncertainty differs significantly across the input space (heteroskedastic uncertainty). Essentially, we want to avoid having subsets of samples with under coverage and/or inefficient prediction sets \cite{romano2020malice}. Marginal-coverage guarantees hold only on average, and do not prevent high variation in the performance of the prediction sets across subgroups in the input space.

Many works have addressed relaxations of the conditional coverage objective by modifying the non-conformity score \cite{papadopoulos2011regression,lei2014distribution,guan2023localized,han2022split,amoukou2023adaptive,seedat2023improving,ghosh2023improving}, learning the non-conformity quantile threshold 
\cite{jung2022batch,bastani2022practical,gibbs2023conformal}, or using a conformal quantile regression objective when the provided model can be retrained \cite{romano2019conformalized}. In particular, a line of work with practical guarantees has focused on the notion of local or group-conditional coverage for a pre-specified set of groups that partitions the input space \cite{vovk2003mondrian,vovk2012conditional} and for overlapping groups \cite{foygel2021limits,jung2022batch,gibbs2023conformal}.

\paragraph{Main Contributions.} Most group conditional conformal prediction approaches presented above rely on pre-defined groups or propose greedy approaches to slice the input space  \cite{lei2014distribution} or the prediction space \cite{sesia2021conformal,bostrom2021mondrian} into equal-sized regions, which scale poorly to higher dimension inputs. To address this issue, we propose a method to learn a generalizable partition function of the input space (or representation mapping) into interpretable groups\footnote{we use the terms regions, groups, partitions and clusters interchangeably} of varying sizes where the quantiles of the non-conformity scores are as homogeneous as possible when conditioned to the group. The main characteristics of the proposed approach are described next.
\begin{itemize}

    \item We adopt an adversarial approach where an agent proposes a partition function that approximates the non-conformity-score conditional quantile; and a judge then evaluates it based on its worst group conditional miscoverage with respect to the one achieved by an interpretable baseline. The agent and the judge use independent datasets drawn from the same distribution. 

    \item We define a fitness score denoted as worst group miscoverage ratio ($\textsc{mcr}$) that allows the comparison of models across different partitions. We use this score to inform the regularization of a family of interpretable clustering functions with the goal of selecting the partition that best generalizes in terms of $\textsc{mcr}$ over the set of partitions that accurately approximate the conditional quantile estimates of the non-conformity scores. 

    \item We learn partitions using decision trees since the identified groups can be described based on interpretable input rules---a valuable property for downstream tasks such as data collection or model selection. The partition function can be integrated with any of the group conditional conformal approaches discussed previously (see Figure \ref{fig:intro}) to produce conformal sets with group conditional guarantees on the discovered regions. 
\end{itemize}

The proposed method serves as an inexpensive alternative to a more strict and costly auditing approach where the auditor leverages an optimization procedure to find the worst computationally identifiable miscoverage group for a given model. In our experiments, we show that we discover meaningful groups that significantly benefit from their inclusion in a group conditional conformal approach. {Code is available at {\small \url{https://github.com/trustyai-explainability/trustyai-model-trust}}}.
% \footnote{ Code available in \small \url{https://github.com/trustyai-explainability/trustyai-model-trust}.
% }
\paragraph{Manuscript Organization.} Section \ref{sec:background} provides a summary of conformal prediction definitions that are used throughout this manuscript and Section \ref{sec:related_work} summarizes additional related work. Section \ref{sec:region_identification} describes the proposed objective for discovering the group partition function based on non-conformity score quantiles. Section \ref{sec:region_conformal} provides the method that integrates group identification with conformal prediction. Finally, Section \ref{sec:experiments} shows experimental results that validate our proposed approach.




% \section{Introduction}

% The interest on the application of Machine Learning (ML) models on different industrial settings has increased in recent years, in particular given the success of deep neural networks and the availability of large amounts of data. In general, predictive ML models are optimized to capture the behaviour of a target variable based on a finite set of observations. One of the major concerns when deploying these models into real-world decision making processes is how to quantify the uncertainty in their prediction, especially in high-stakes domains like health-care or finance where there is a robust penalty for making mistakes.

% Typical ML models produce point predictions (e.g., expected values for regression or most likely label in the case of classification), these are not a-priori informative on the range of values that the target variable can take within normal operation (i.e. set of values expected to occur with high probability). Providing a calibrated prediction set (or interval) can be of great value for a decision maker that wants to consider worst-case scenarios. Moreover, understanding how the uncertainty of a model's prediction differs across varying subsets of the available data can inform the data collection process, model improvements or model selection/assessment.

% \begin{figure}[h!]
% \centering

% \subfloat[Non-conformity region-based prediction framework]{
% \includegraphics[width=0.9\columnwidth]{UAI2024/figures/regionbased_scp_scheme.png}
% \label{fig:regionbased_scp_scheme}} 

% \subfloat[Split Conformal Prediction]{
% \includegraphics[width=0.45\columnwidth]{UAI2024/figures/SCP_regression.png}
% \label{fig:SCP_regression}} 
% \subfloat[Region-based SCP]{
% \includegraphics[width=0.45\columnwidth]{UAI2024/figures/region_based_SCP_regression.png}
% \label{fig:region_based_SCP_regression}} 

% \begin{center}
% % \caption{(\ref{fig:regionbased_scp_scheme}) Overview of proposed framework to produce prediction intervals/sets. We first decompose the input space \footnote{optionally on a representation space, $\phi(\mathcal{X})$} into interpretable groups \footnote{we use the terms regions, groups, partitions and clusters interchangeably} where each group contains homogeneous predictions of the $1-\alpha$-th quantile of the non-conformity scores of a ML model's prediction $f(\cdot)$. We then build a prediction interval/set, denoted $C_{\tau}(X) = C_{\alpha}(X,g_{\tau}(X))$ where $C_{\alpha}$ is the group-conditional conformal predictor, which depends on both the input $X$ and on the group prediction $g_{\tau}(X)$. $C_{\tau}$ satisfies group conditional coverage guarantees for the identified groups. (\ref{fig:SCP_regression}) regression example of heteroskedastic uncertainty in the model's prediction (blue line), the x-axis indicates the input variable, y-axis the target variable, and red dots the test samples. Prediction bands (blue) are produced by standard SCP with a coverage target of 0.95 ($\alpha = 0.05$), the desired coverage is achieved on average but there is significant disparity across regions of $X$. (\ref{fig:region_based_SCP_regression}) shows the prediction bands obtained with the proposed region-based approach in conjunction with SCP. Five groups where identified, and the group conditional coverage is improved significantly w.r.t. standard SCP. }
% \caption{ (\ref{fig:regionbased_scp_scheme}) Overview of proposed framework to produce prediction intervals/sets. We first decompose the input space 
% % \footnote{optionally on a representation space, $\phi(\mathcal{X})$} 
% into interpretable groups 
% % \footnote{we use the terms regions, groups, partitions and clusters interchangeably}
% where each group contains homogeneous predictions of the $1-\alpha$-th quantile of the non-conformity scores of a ML model's prediction $f(\cdot)$. We then build a prediction interval/set, denoted $C_{\tau}(X) = C_{\alpha}(X,g_{\tau}(X))$ where $C_{\alpha}$ is the group-conditional conformal predictor, which depends on both the input $X$ and on the group prediction $g_{\tau}(X)$. $C_{\tau}$ satisfies group conditional coverage guarantees for the identified groups. (\ref{fig:SCP_regression}) regression example of heteroskedastic uncertainty in the model's prediction (blue line), the x-axis indicates the input variable, y-axis the target variable, and red dots the test samples. Prediction bands (blue) are produced by standard SCP with a coverage target of 0.95 ($\alpha = 0.05$), the desired coverage is achieved on average but there is significant disparity across regions of $X$. (\ref{fig:region_based_SCP_regression}) shows the prediction bands obtained with the proposed region-based approach in conjunction with SCP. Five groups where identified, and the group conditional coverage is improved significantly w.r.t. standard SCP. }
% \label{fig:intro}
% \end{center}
% \end{figure}

% Conformal prediction methods \cite{vovk2005algorithmic} have gained significant popularity in recent years since they offer a distribution free approach to quantify the uncertainty of a black box model's prediction with generalization guarantees \cite{shafer2008tutorial,angelopoulos2021gentle}. In particular, split conformal prediction (SCP) \cite{papadopoulos2002inductive} is an attractive post-hoc, model agnostic approach that only requires access to the model's prediction and a calibration dataset. This is especially useful in settings where retraining or modifying an ML model to produce uncertainty estimates is infeasible, or when only query access to an ML model is possible (e.g., LLMs,\footnote{Large Language Models}).

% Given a desired miscoverage level $\alpha$ (i.e. error-rate) conformal prediction methods produce prediction sets/intervals based on a black box model's prediction that are guaranteed to contain, on average, the ground truth value of the target variable with probability larger or equal than $1-\alpha$. They often rely on the quantile estimation of a non-conformity score, which is a measure of the disagreement between the target variable and the model prediction (e.g., absolute error), and only require that the calibration dataset be exchangeable\footnote{This is a weaker condition than full statistical independence} with the data samples the model will be tested on. Different works have studied how to adapt these methods to scenarios where the exchangeability assumption is violated, such as distribution shifts or time series settings \cite{gibbs2021adaptive,stankeviciute2021conformal,barber2023conformal}.





% A significant amount of work has focused on understanding the feasibility of more efficient prediction sets and stronger-than-average coverage guarantees. An ideal goal would be to achieve input-conditional coverage (i.e., the coverage guarantees hold for each possible input), which has been proved to be impossible in practice \cite{vovk2012conditional,lei2014distribution}. Nonetheless, weaker guarantees such as local conditional coverage \cite{foygel2021limits} or group and level-set conditional coverage \cite{jung2022batch} are possible. Providing predictions sets with close to conditional coverage guarantees is valuable in settings where the model's prediction uncertainty differs significantly across the input space (heteroskedastic uncertainty). Essentially, we want to avoid having subsets of samples with under coverage and/or inefficient prediction sets \cite{romano2020malice}. Marginal coverage guarantees hold only on average, and do not prevent high variation in the performance of the prediction sets across subgroups in the input space.

% Many works have addressed relaxations of the conditional coverage objective by modifying the non-conformity score \cite{papadopoulos2011regression,lei2014distribution,guan2023localized,han2022split,amoukou2023adaptive,seedat2023improving,ghosh2023improving}, learning the non-conformity quantile threshold 
% \cite{jung2022batch,bastani2022practical,gibbs2023conformal}, or switching to a conformal quantile regression objective when the provided model can be retrained \cite{romano2019conformalized}. In particular, a line of work with practical guarantees has focused on the notion of local or group-conditional coverage for a pre-specified set of groups that partitions the input space \cite{vovk2003mondrian,vovk2012conditional} and for overlapping groups \cite{foygel2021limits,jung2022batch,gibbs2023conformal}.

% \paragraph{Main Contributions.} Most group conditional conformal prediction approaches presented above rely on pre-defined groups or propose greedy approaches to slice the input space into equal-sized regions \cite{lei2014distribution} or the prediction space \cite{sesia2021conformal,bostrom2021mondrian}, which scales poorly to higher dimension inputs. To address this issue, we propose a method to learn a generalizable partition function of the input space (or representation mapping) into interpretable groups \footnote{we use the terms regions, groups, partitions and clusters interchangeably} of varying sizes where the quantiles of the non-conformity scores are as homogeneous as possible when conditioned to the group. The main characteristics of the proposed approach are described next.
% \begin{itemize}

%     \item We adopt an adversarial approach where an agent proposes a partition function that approximates the non-conformity-score conditional quantile, a judge then evaluates it based on its worst group conditional miscoverage with respect to the one achieved by an interpretable baseline. The agent and the judge use independent datasets drawn from the same distribution. 

%     \item We define a fitness score denoted as worst group miscoverage ratio ($\textsc{mcr}$) that allows the comparison of models across different partitions. We use this score to inform the regularization of a family of interpretable clustering functions with the goal of selecting the partition that best generalizes in terms of $\textsc{mcr}$ over the set of partitions that accurately approximate the conditional quantile estimates of the non-conformity scores. 

%     \item We learn partitions using decision trees since the identified groups can be described based on interpretable input feature rules; a valuable property for downstream tasks such as data collection or model selection. The partition function can be integrated with any of the group conditional conformal approaches discussed in the previous paragraphs (see Figure \ref{fig:intro}) to produce conformal sets with group conditional guarantees on the discovered regions. 
% \end{itemize}

% The proposed method serves as an inexpensive alternative to a more strict and costly auditing approach where the auditor leverages an optimization procedure to find the worst computationally identifiable miscoverage group for a given model. In our experiments, we show that we discover meaningful groups that significantly benefit from their inclusion in a group conditional conformal approach.

% \paragraph{Manuscript Organization.} Section \ref{sec:background} provides a summary of conformal prediction definitions that are used throughout this manuscript and Section \ref{sec:related_work} summarizes additional related work. Section \ref{sec:region_identification} describes the proposed objective for discovering the group partition function based on non-conformity score quantiles. Section \ref{sec:region_conformal} provides the method that integrates group identification with conformal prediction. Finally, Section \ref{sec:experiments} shows experimental results that validate our proposed approach.