\section{Appendix: Score Functions}

\label{sec:scorefunc}
The effectiveness of any conformal method, specifically the size of the resulting prediction sets, is highly dependent on the score function. This section therefore delves into the theory of optimal score functions to construct prediction sets that are as small as possible while maintaining coverage. We then introduce practical, sample-based score estimators suitable for the probabilistic output of a quantum model.

\subsection{What Makes a Score Function Optimal?}
\label{app: optimality S1}

The conformal prediction literature considers several optimality criteria. We follow \cite{angelopoulos2024theoretical}, focusing on two natural objectives: constructing prediction sets with minimal expected size subject to either marginal or conditional coverage.

Let \((\Omega, \mathcal{F}, \mathbb{P})\) be a probability space, and let \(X\) and \(Y\) be random variables taking values in Borel subsets \(\mathcal{X} \subseteq \mathbb{R}^{d_X}\) and \(\mathcal{Y} \subseteq \mathbb{R}^{d_Y}\), equipped with their respective Borel \(\sigma\)-algebras. Let \(P_{X,Y}\) denote their joint law and \(P_X\) the marginal law of \(X\). Consider measurable joint prediction sets \(B \in \mathcal{F}_\mathcal{X} \otimes \mathcal{F}_\mathcal{Y}\), with sections
\[
B(x) \coloneqq \{y : (x,y) \in B\}.
\]
Let \(|B(x)|\) denote the Lebesgue measure on \(\mathcal Y\). Given a miscoverage level \(\alpha \in (0,1)\), the two optimisation problems are:
\begin{enumerate}
    \item \textbf{Marginal coverage objective:}
    \label{min:1}
    \[
    \underset{B}{\operatorname{argmin}} \, \mathbb{E}\left[\lvert B(X)\rvert\right]
    \quad \text{subject to} \quad
    \mathbb{P}\big(Y \in B(X)\big) \geq 1 - \alpha.
    \]

    \item \textbf{Conditional coverage objective:}
    \label{min:2}
    \[
    \underset{B}{\operatorname{argmin}} \, \mathbb{E}\left[\lvert B(X)\rvert\right]
    \quad \text{subject to} \quad
    \mathbb{P}\big(Y \in B(x) \mid X = x\big) \geq 1 - \alpha
    \quad \text{for } P_X\text{-a.e. } x.
    \]
\end{enumerate}

Assume that \(P_{X,Y}\) is absolutely continuous with respect to Lebesgue product measure on \(\mathcal X \times \mathcal Y\), with joint density \(p(x,y)\), marginal density \(p_X(x)>0\) Lebesgue-a.e., and conditional density
\[
p(y\mid x) = \frac{p(x,y)}{p_X(x)}.
\]
For simplicity, assume that the relevant density values have no flat spots: for the marginal problem, \(p(Y\mid X)\) has no atoms under \(P_{X,Y}\); for the conditional problem, for \(P_X\)-a.e. \(x\), the random variable \(p(Y\mid x)\) under \(Y\mid X=x\) has no atoms. 

In conformal prediction, prediction sets are typically constructed as sublevel sets of a score function \(\hat S(x,y)\):
\[
C_\lambda(x) \coloneqq \{y\in\mathcal{Y} : \hat S(x,y) \leq \lambda\},
\]
where \(\lambda \in \mathbb R\) is calibrated to achieve the desired coverage. For any fixed prediction set \(B\), choosing
\[
\hat S(x,y)=1-\mathds{1}_{B(x)}(y),\qquad \lambda=0,
\]
gives \(C_\lambda(x)=B(x)\). The optimal-score problem is therefore the stronger problem of identifying scores whose sublevel sets generate optimal prediction sets across all coverage levels.

\begin{definition}[Optimal score functions]
\label{def:opt_score class}
A score function \(\hat S:\mathcal X\times\mathcal Y\to\mathbb R\) is optimal with respect to optimisation problem \(i\) if, for every \(\alpha\in(0,1)\), there exists \(\lambda\in\mathbb R\) such that \(C_\lambda\) is an optimal solution to problem \(i\), up to null modifications. The class of such score functions is denoted by \(\mathcal S_i\).
\end{definition}

We now give a simple sufficient condition for optimality under the marginal coverage objective.

\subsection{Optimal Scores for Marginal Coverage \texorpdfstring{(\(\mathcal{S}_1\))}{(S1)}}
\label{sec:S_1}

Adapting arguments from \cite{lei2014classification,sadinle2019least,kato2023review}, we show that any strictly decreasing transformation of the conditional density yields an optimal score for the marginal coverage objective.

\begin{theorem}[Sufficient condition for marginal optimality]
\label{thm:oracle max efficiency}
Under the assumptions above, suppose there exists a strictly decreasing function
\(\phi:[0,\infty)\to\mathbb R\), such that
\[
\hat S(X,Y)=\phi(p(Y\mid X))
\]
\(P_{X,Y}\)-almost surely. Then \(\hat S\in\mathcal S_1\).
\end{theorem}

For the proof see Appendix \ref{app:oracle max efficiency}. A similar argument is also presented in \cite{sadinle2019least}, but for the case of discrete $\mathcal Y$.

Theorem \ref{thm:oracle max efficiency} states that any score function that is a strictly decreasing transformation of the conditional probability density is optimal for producing minimal prediction sets with marginal coverage. By exploring various choices for \(\phi\), score functions that satisfy this sufficient condition can be recovered, ranging from well-known forms to more nuanced variants. For instance, applying \(\phi(x)=-x\) yields \(\hat S=-p(y\mid x)\); applying \(\phi(x)=x^{-1}\) on \(x>0\) (with appropriate handling at \(x=0\)) yields \(\hat S=p(y\mid x)^{-1}\); and choosing \(\phi(x)=-\log(x)\) on \(x>0\) (again with appropriate handling at \(x=0\)) produces the negative log density score.

In many machine learning settings, any of these forms can be implemented directly. In our setting, the true conditional density \(p(y\mid x)\) is unknown and hence it must be estimated from PQC shots.

In classical regression over \(\mathbb{R}^n\), there is a particularly appealing connection to the widely used Euclidean distance score function,
\[
\hat S_\text{Euc}(x,y)=\Vert y-f(x)\Vert_2,
\]
where \(f(x)\) is a point prediction model. Theorem \ref{thm:oracle max efficiency} implies that \(\hat S_\text{Euc}\) is marginally optimal whenever there exists a strictly decreasing function \(\phi\) such that
\[
\phi\bigl({p}(y\mid x)\bigr) = \Vert y-f(x)\Vert_2
\]
for \(P_{X,Y}\)-almost every \((x,y)\). Equivalently, on the relevant support,
\[
{p}(y\mid x) = \phi^{-1}\Bigl(\Vert y-f(x)\Vert_2\Bigr).
\]
Thus, the commonly used score function \(\Vert y-f(x)\Vert_2\) is optimal for marginal coverage in the sense of optimisation problem \ref{min:1} whenever the conditional density is radially symmetric about \(f(x)\), with a radial profile that is decreasing as the distance from \(f(x)\) increases and does not depend on \(x\). This condition is satisfied, for example, when \(f(x)\) is the conditional mean of a homoscedastic isotropic Gaussian model, but it generally fails in the presence of skewed conditional distributions, anisotropy, or heteroscedasticity.

To deal with more general distributions, we look towards probability density estimators. For example, given an appropriate choice of \(k\), \(\hat S_\text{k-NN}\) can be viewed as a proxy for a negative k-NN density estimator \citep{zhao2022analysis,knn_density_estimation}. Under suitable consistency conditions, this suggests that \(\hat S_{\text{k-NN}}\) asymptotically approaches a score in the sufficient optimality class as the number of samples \(M \rightarrow \infty\). Similarly, since kernel density estimation is a consistent density estimator under an appropriate choice of bandwidth \(h\) \citep{parzen1962estimation, devroye1986strong, davis2011remarks}, \(\hat S_\text{KDE}\) can also asymptotically approach this class. This theory assumes access to samples taken from the true conditional distributions; however, it provides motivation for the general case.

\subsection{Optimal Scores for Conditional Coverage ($\mathcal{S}_2$)}
\label{sec:S_2}
For the conditional guarantee optimisation problem, a similar approach can be taken. We first recall the high-density level set \citep{highdensityregion}:
\[
    H_x(t)=\left\{y\in \mathcal Y : p(y\mid x)\geq t\right\}.
\]
This is the set of all outcomes \(y\) that are at least as probable as the threshold \(t\). Using this gives rise to the next theorem, which provides a sufficient condition for a score function to attain the conditional guarantee.

\begin{theorem}[Sufficient condition for conditional optimality]
\label{thm:oracle with conditional}
Suppose there exists a strictly increasing function \(\phi:[0,1]\rightarrow\mathbb{R}\) such that
\[  
    \hat S(x,y)=\phi\left(\int_{H_x(p(y\mid x))}p(y'\mid x)  \mathrm{d} y'\right)
\]
for \(P_{X,Y}\)-almost every \((x,y)\in\mathcal X \times \mathcal Y\). Then \(\hat S\in \mathcal S_2\).
\end{theorem}
For the proof see Appendix \ref{app:proof-oracle with conditional}. A similar argument is also presented in \cite{angelopoulos2024theoretical,romano2020classification} but for the case of discrete \(\mathcal Y\).
 

Compared with the simpler sufficient condition for \(\mathcal S_1\), this form is more abstract and can be harder to reduce to immediately implementable scores. However, in parametric models the level-set probability often has a closed form. For example, in regression over \(\mathbb{R}\), suppose that for each \(x\in \mathcal{X}\) we have 
\[
Y\mid{X=x} \sim \mathcal N(\mu(x),\sigma^2(x)),
\] 
where \(\mathcal N(\mu(x),\sigma^2(x))\) denotes the univariate normal distribution with mean \(\mu(x)\) and variance \(\sigma^2(x)\). Then the level set corresponding to density value \(p(y\mid x)\) is the symmetric interval around \(\mu(x)\) with radius \(\lvert y-\mu(x)\rvert\). Hence,
\[
\mathbb P \big(Y\in H_x(p(y\mid x))\mid X=x\big)
= 2\Phi \left(\frac{\lvert y-\mu(x)\rvert}{\sigma(x)}\right)-1,
\]
where \(\Phi\) denotes the standard normal CDF. Therefore any score of the form
\[
\hat S(x,y)  =  \phi \left(2\Phi \left(\frac{\lvert y-\mu(x)\rvert}{\sigma(x)}\right)-1\right),
\]
for some strictly increasing \(\phi:[0,1]\to\mathbb R\), satisfies the sufficient condition above. In particular, choosing 
\[
\phi(z) = \Phi^{-1} \Big(\frac{z+1}{2}\Big),
\] 
yields the score
\[
\hat S(x,y) = \frac{\lvert y-\mu(x)\rvert}{\sigma(x)},
\]
which parallels the result from Section \ref{sec:S_1}. Moreover, this construction extends to any symmetric distribution with a strictly monotonic probability density function in the radial distance from the mean.


When the form of the conditional distribution is unknown, we again revert to a sample-based probability density estimator. Using this estimator and taking \(\phi(x) = x\), we then obtain the \(\hat S_\text{HDR}\) score. Under suitable consistency conditions, this score asymptotically approaches a score satisfying the sufficient condition for \(\mathcal{S}_2\) as \(M\rightarrow\infty\).

\subsection{Ties to Adaptive Quantum Conformal Prediction}

The optimisation problems and results above do not depend on a conformal construction. Rather, they characterise optimal prediction sets through level sets of score functions, and identify \emph{subclasses} of score functions that attain optimality under the marginal and conditional coverage objectives when the true conditional density \( p(y \mid x) \) is known. Therefore this connection serves as structural guidance in the quantum conformal setting, as opposed to a formal guarantee. The analysis does not quantify how approximation error in \( p(y \mid x) \), finite measurement effects, or calibration variability influence the resulting set sizes.

However, under ideal conditions --- where measurements closely reflect \( Y \mid X \) and the calibration dataset is sufficiently large to estimate thresholds accurately --- applying QCP with the described score functions is expected to produce prediction sets that are close to optimal in expectation. The same reasoning extends to AQCP. Since the optimality statement holds for any fixed miscoverage level \(\alpha \in (0,1)\), they apply equally to the time-varying levels \(\alpha_i\) specified by the adaptive procedure.

