\section{Conformal Prediction Under Non-Stationary
Model Noise}

Quantum models are also inherently noisy. While it is well established that quantum hardware experiences noise, the extent and nature of non-stationary noise remain active areas of research~\citep{he2024stability, proctor2020detecting}. Much of the existing analysis focuses on the gate level, with comparatively less work at the level of full circuits. Notable recent works by \cite{dasgupta2020stability, dasgupta2021stability, dasgupta2022stability} explore this issue in detail.

To mitigate the effects of drift, IBM Quantum systems perform both hourly and daily recalibrations~\citep{ibm_calibration_jobs}. Furthermore, many current experimental demonstrations rely on recalibrating quantum computers immediately before execution and adjusting them during runtime~\citep{he2024stability}. Sources of non-stationary noise are diverse, including temperature fluctuations, oscillations in control equipment, and ambient laboratory conditions~\citep{proctor2020detecting}. In more extreme cases, cosmic rays originating from outer space have been shown to cause catastrophic multi-qubit errors approximately every ten seconds~\citep{stability_cosmic_rays}.

Having outlined conformal prediction and quantum machine learning, we now turn to their intersection. We introduce a model for PQC-based learning that incorporates time-dependent noise effects. We then formalise how this temporal variation disrupts the statistical assumptions, particularly the exchangeability of scores, that underlie standard conformal methods, motivating the need for a new approach.

\subsection{PQCs as Non-stationary Probabilistic Models}
\label{sec:NoisyModel}

To establish the formal setting, let $\mathcal{X}$ and $\mathcal{Y}$ denote a classical feature space and classical target space respectively. For a given feature $x \in \mathcal{X}$, in the angle encoding setting adopted here, a $Q$-qubit PQC prepares the quantum state
\[
    \ket{\psi(x)} = U_{N_G}(x)\circ\cdots\circ U_1(x)\ket{0}^{\otimes Q}.
\]
Each $U_i(x)$ is a unitary operator parametrised by $x$ representing the ideal $i^\text{th}$ gate, and $N_G$ is the number of gates \citep{GeneralBackgroundQuantumInformation}. The specific transformation $U(x)= U_{N_G}(x)\circ\cdots\circ U_1(x)$ depends on the circuit ansatz, any circuit parameters $\theta$, and the data-encoding scheme chosen (see Section~\ref{sub: QML}). We suppress these dependencies in the notation for clarity. Define the superoperator $\gU_{x,i}(\cdot) \equiv
U_i(x)(\cdot)U_i(x)^\dagger$. In an ideal (noiseless) setting, the resulting state is described by the density matrix
\[
    \rho(x) = \ket{\psi(x)}\bra{\psi(x)}
    = U(x)\ket{0}^{\otimes Q}\bra{0}^{\otimes Q}U(x)^\dagger=\gU_{x,N_G}\circ\cdots\circ \gU_{x,1}(\ket{0}^{\otimes Q}\bra{0}^{\otimes Q}).
\]
When using quantum hardware, the measured state deviates from this ideal due to various noise processes, such as gate errors, decoherence, and crosstalk. Following the convention in \cite{NoisyChannel}, the noisy quantum gate can be written as $\gE_t\circ U$, with $\gE_t$ being the time-dependent noise channel and $t$ indexing the effective execution time of the circuit shot. This yields the noisy output state
\[
    \rho_\text{noisy}(x,t)= \gE_{t,N_G}\circ\gU_{x,N_G}\circ\cdots\circ \gE_{t,1}\circ\gU_{x,1}(\ket{0}^{\otimes Q}\bra{0}^{\otimes Q}).
\]
Although in reality individual gates and readout processes occur at different physical times, a single coarse timestamp per shot is enough to establish that the score distribution can vary with time. A computational-basis measurement is performed on the noisy state $\rho_{\text{noisy}}(x,t)$. 
Since the circuit acts on $Q$ qubits, the measurement yields one of the $2^Q$ bitstring outcomes. To interpret bitstring outcomes in the target space $\mathcal{Y}$, we define a task-dependent mapping
\[
    f : \{0,1\}^Q \to \mathcal{Y}.
\]
This mapping distributes bitstrings over a grid in $\mathcal Y$, with resolution growing as $Q$ increases. Define the random variable
\[
    \hat{Y}_{x,t} = f(b), 
    \quad \text{whenever the measurement at time }t \text{ with classical input } x\text{ yields } b\in\{0,1\}^Q.
\]
When dealing with noisy quantum measurements, the most comprehensive approach uses a Positive Operator-Valued Measure (POVM) \citep[Box~2.5]{GeneralBackgroundQuantumInformation}. For a system of $Q$ qubits, the POVM consists of $2^Q$ elements, denoted as $\{\Pi_j\}$, with each element corresponding to a $Q$-bit measurement outcome \(b_j\). The probability of getting a specific outcome $y$ is given by
\[
    \mathbb{P}(\hat Y_{x,t}=y\mid X=x) = \sum_{\{j:\;f(b_j)=y\}}\operatorname{Tr}(\rho_\text{noisy}\Pi_j),
\]
due to Born's rule \citep{born1926quantenmechanik}.

In a perfect, noise-free scenario, each POVM element $\Pi_j$ is simply the projector $\ket{b_j}\bra{b_j}$. However, with possible non-stationary noise in the measurement stage, the $\Pi_j$'s can be any positive semidefinite, time-dependent operators that sum to the identity matrix \citep{Measurementerror}.

A single execution plus measurement (a shot) at time $t$ involves both the noise from the PQC execution and the measurement. Denote the shot from the distribution $\hat Y_{x,t}$ by $\hat y$. Collecting $M$ such shots, at times $T=\{t_1,\ldots,t_M\}$, produces the sample multiset \(
    \mathcal{A}_{x,T}=\multiset{\hat y_m}_{m=1}^M
\). We denote the samples as a multiset to represent potential repeated measurements of the same bitstring. Each shot carries its own timestamp $t_m$, so taking an additional shot, even on the same input, may draw from a different distribution, reflecting the drift in both the execution and measurement noise.

\subsection{Consequences of Non-stationary Noise on Split Conformal Prediction}

In standard settings, a score function is defined as a mapping 
\[
\hat S:\mathcal{X}\times\mathcal{Y}\to\mathbb{R},
\]
which assigns a real-valued score to each feature-target pair $(x,y)$. This is typically used to measure the discrepancy between the model output and an observed value. In our case, however, the situation differs: we obtain a stochastic multiset, $\mathcal{A}_{x,T}$, instead of a single deterministic value. For the score to be a well-defined deterministic function, it is therefore necessary to take $\mathcal{A}_{x,T}$ as an additional input:
\[
\hat S(x, y \,;\, \mathcal{A}_{x,T}), \quad \text{with } x \in \mathcal{X}, \; y \in \mathcal{Y}.
\]
A crucial point is that $\mathcal{A}_{x,T}$ is drawn from a distribution that is conditional on the shot times $T$. As a result, the induced scores are inherently time-dependent if the noise of quantum hardware changes across time. This breaks the usual exchangeability assumption, as even if the underlying data $(X_i,Y_i)$ are exchangeable, the augmented observations
\[
Z_i = \bigl(X_i, Y_i; \multiset{\hat Y_{X_i,t}}_{t\in T_i}\bigr),
\]
are not, and therefore the corresponding scores
\[
S_i = \hat{S}\bigl(X_i, Y_i;\,\multiset{\hat Y_{X_i,t}}_{t\in T_i}\bigr),
\]
are not necessarily exchangeable. While a hypothetical score function could be constructed to remove the time dependency (e.g.,\ $\hat{S}(x, y;\,\multiset{\hat Y_{x,t}}_{t\in T})={\Tilde{S}}(x,y)$), the conformity score of any QCP procedure is designed to utilise the quantum model's output, and hence should utilise these time-dependent terms.

Without exchangeable scores, we cannot assert that the rank of the test score is uniformly distributed on the set \(\{1,\ldots, n+1\}\). Consequently, without making the assumption of stationary noise, we cannot obtain guarantees in the form given in Theorem~\ref{Thm:marginal_coverage}.



