\section{Real-World Perturbations for Time Series Data}\label{app:perturbations}


Here, we introduce five primary techniques to augment time series: \textit{jitter}, \textit{scaling}, \textit{magnitude warping}, \textit{time warping} and \textit{window warping}.
Each method provides a unique way of altering the amplitude and frequency of time series data.

\paragraph{Jitter}
The process of jittering, which involves introducing Gaussian noise to time series data, stands as a straightforward yet powerful method of transformation-based data augmentation~\citep{iwana2021empirical}.
This operation can be mathematically represented as:
%
\begin{equation}
    \tilde{x} = x_1 + \delta_1, \ldots, x_t + \delta_t, \ldots, x_T + \delta_T,
\end{equation}
%
where each time step $t$ sees Gaussian noise $\delta_i$ added, following a normal distribution $\delta_i \sim \mathcal{N}(0, \sigma^2)$. 
The standard deviation $\sigma > 0$ of this noise is a tunable hyperparameter. 


\paragraph{Scaling}
Scaling pertains to modifying the global magnitude or intensity of a time series through the multiplication of a random scalar value. 
With a scaling parameter denoted as $\delta$, the scaling process is expressed as:
%
\begin{equation}
    \tilde{x} = \delta x_1, \ldots, \delta x_t, \ldots, \delta x_T.
\end{equation}
%
The scaling parameter $\delta$ can be derived from a Gaussian distribution $\delta \sim \mathcal{N}(1, \sigma^2)$ with $\sigma$ as a tunable hyperparameter, or it could be selected as a random value from a predetermined set. 
In the context of time series, the term scaling carries different meanings with respect to image processing, where it is associated to contrast.
In time series, scaling strictly refers to the amplification of the magnitude of the elements, without altering the duration of the time series.

\paragraph{Magnitude warping}

Magnitude warping, as presented by \cite{um2017data}, is a data augmentation strategy specifically designed for time series data. This technique alters the magnitude of a signal through the application of a smoothed curve. 
Formally, the newly generated, or augmented, time series is derived through the following expression:
%
\begin{equation}
    \tilde{x} = \delta_1x_1, \ldots , \delta_t x_t, \ldots , \delta_T x_T,
\end{equation}
%
where $\delta_1, \ldots , \delta_t, \ldots , \delta_T$ is a sequence derived from interpolating a cubic spline $S(u)$ with knots $u = u_1, \ldots , u_i, \ldots , u_I$. 
Each knot $u_i$ originates from a distribution $\mathcal{N}(1, \sigma^2)$, with the number of knots $I$ and the standard deviation $\sigma$ acting as tunable hyperparameters. 
The core concept of magnitude warping is to introduce minor variations in the data by either amplifying or diminishing random segments of the time series.

\paragraph{Time warping}
Time-warping involves stretching or compressing the time axis to induce variability in the temporal dynamics.
Given a univariate time series $x\in \R^T$ subjected to a time-warping perturbation characterized by a smooth warping path, the resulting augmented time series can be denoted as:
\begin{equation}
    \tilde{x} =  x_{\phi(1)}, \dots, x_{\phi(t)}, \dots, x_{\phi(T)}.
\end{equation}
%
In this representation, $\phi(\cdot)$ is a time-warping function, which modifies the time indices based on a smooth curve. 
In previous works~\citep{le2016data, um2017data, iwana2021time}, this curve was characterized by a cubic spline, $S(u)$, having knots defined as $u = u_1, \dots, u_i, \dots, u_I$, where each knot height, $u_i$, was derived from a normal distribution, $u_i \sim \set{N}(1, \sigma^2)$. 
However, in this work we consider a different approach to temporally shift the time series. 



\paragraph{Window warping}
A familiar technique of time warping termed as window warping has been introduced by \cite{le2016data}. 
In this method, a random segment of the time series, starting from  $p\in\mathbb{N}$ with $0 < p < T$ and ending at $p + \lceil\sigma\cdot T\rceil$, is selected and either stretched by a factor of 2 or contracted by a factor of $\frac{1}{2}$.
Then the segment is interpolated back into the original time series.
Even though the stretching and contracting factors are preset to 2 and $\frac{1}{2}$ respectively, these values can be adjusted or optimized to other values as needed.


\section{Proof of Theorem~\ref{th:smooth_coverage}}\label{app:proof_smooth_conformal_set}

\smooth*

\begin{proof}
From Corollary 2 of \cite{li2021tss}, we know that if the maximum interpolation error satisfy \autoref{eq:max_interp_error}, then it is guaranteed that $\forall \lambda \in \set{Z}_\lambda : y_A = \argmax_y G_\phi (y\, |\, \phi(x, \lambda)$.
Therefore, if we define the robustness certificates radius as:
\begin{equation*}
    R_\pi\; \eqdef\; \frac{\sigma}{2} \min_{1\leq j \leq N} \left( \phi^{-1} (p_A^{(j)}) - \phi^{-1}(p_B^{(j)})  \right),
\end{equation*}
we can link the observed score $S_\phi(\tilde{x}^{(n+1)}, y)$ with the unobserved score $S_\phi(x^{(n+1)}, y)$ for any given $y \in \set{Y}$.
Thus, let us consider the definition of the conformal set as in \autoref{eq:smooth_conformal_set}:
\begin{equation*}
\begin{aligned}
    \prob\left[y^{(n+1)} \in \set{C}_\pi (\tilde{x}^{(n+1)})\right] &= \prob \left[ S_\phi( \tilde{x}^{(n+1)}, y^{(n+1)}) \leq Q_{1 - \alpha} (\{S^{(i)}\}_{i\in \set{D}_{cal}}) + R_\pi \right]\\
    \text{(\autoref{eq:inequality_conformity_score})}\quad  &\geq \prob \left[ S_\phi( x^{(n+1)}, y^{(n+1)}) + R_\pi \leq Q_{1 - \alpha} (\{S^{(i)}\}_{i\in \set{D}_{cal}}) + R_\pi \right]\\
    &= \left[ S_\phi( x^{(n+1)}, y^{(n+1)}) \leq Q_{1 - \alpha} (\{S^{(i)}\}_{i\in \set{D}_{cal}})\right]\\
    \text{(\autoref{eq:conformal_score})}\quad &\geq 1 - \alpha
\end{aligned}
\end{equation*}

    
\end{proof}



\section{Lipschitz Constant for Cubic Spline Interpolation}\label{app:lipschitz_constant}


When dealing with magnitude and window warping, estimating the maximum error in interpolation involves finding the largest value of the derivative of the cubic spline used for interpolation.
This maximum value is considered as the Lipschitz constant.
The cubic spline is a piecewise polynomial function, typically of degree three. 
Assume we have a sequence of $n+1$ knots, $(x_0, y_0)$ through $(x_n, y_n)$.
There exists a cubic spline segment $q_i(x)$ defined as:
\begin{equation}
    \begin{aligned}
        q_i(x) &= (1-t(x)) y_{i-1} + t(x) y_i + t(x)(1-t(x))((1-t(x)) a_i + t(x) b_i),\\ \text{with}&\quad
        t(x) = \frac{x - x_{i-1}}{x_i - x_{i-1}}, \quad
        a_i = k_{i-1}(x_i - x_{i-1}) - (y_i - y_{i-1}), \quad
        b_i = -k_i(x_i - x_{i-1}) + (y_i - y_{i-1}),
    \end{aligned}
\end{equation} 
%
where $k_i$ represents the second order derivative of the spline at the knot points $(x_i, y_i)$.
To compute the derivative of the cubic spline function $q_i(x)$, we first need to recognize that $q_i(x)$ is a composite function involving $t$ which itself is a function of $x$. 
Therefore, we will use the chain rule to find the derivative, i.e. $\frac{dq_i}{dx} = \frac{dq_i}{dt} \cdot \frac{dt}{dx}$.
Thus, the first order derivative is defined as:
\begin{equation}
    \frac{dq_i}{dx} = \frac{y_{i} - y_{i-1}}{x_{i} - x_{i-1}} + (1 - 2t)\frac{a_i(1-t)+b_i t}{x_{i}-x_{i-1}} + t(1-t)\frac{b_i - a_i}{x_{i} - x_{i-1}},
\end{equation}
%
where we omit the dependence of $t$ on $x$ for brevity.
This derivative represents the rate of change of the cubic spline segment $q_i(x)$ with respect to $x$, and it varies along different segments of the spline depending on the values of $x_i, x_{i-1}, y_i, y_{i-1}, k_i,$ and $k_{i-1}$.
From the spline's derivative, the Lipschitz constant can be estimated by finding the maximum of its absolute values. 
The maximum value of the first derivative occurs either at the endpoints of a segment (i.e., at the knots $x_{i-1}$ or $x_i$) or at a critical point within the segment where the second-order derivative is zero.
The second order derivative $\frac{d^2q_i}{dx^2}$ gives the maximum rate of change of the first derivative. 
Thus, let us compute \( \frac{d^2q_i}{dx^2} \) and set it equal to zero.
The second derivative of the cubic spline function $q_i(x)$ is:

\begin{equation}
    \frac{d^2q_i}{dx^2} = 2 \frac{b_i - 2a_i+(a_i - b_i)3t}{(x_i - x_{i-1})^2}.
\end{equation}

Next, we set this second derivative to zero and solve for \( x \). This will give us the points where the curvature of the spline segment changes, indicating inflection points. 
The solution to the equation \( \frac{d^2q_i}{dx^2} = 0 \) is:

\begin{equation}
    t=\frac{2a_i - b_i}{3(a_i - b_i)},\quad \text{or} \quad x = \frac{(2a_i - b_i)x_i + (a_i - 2b_i)x_{i-1}}{3(a_i - b_i)}.
\end{equation}

This formula represents the inflection point of the spline segment between $x_{i-1}$ and $x_i$. 
Inflection points are where the curvature of the spline changes sign and we can obtain the maximum value of the first order derivative by inserting $t$ (or $x$) in $\frac{dq_i}{dx}$. In practice, the specific value where this occurs depend on the values of $x_i$, $x_{i-1}$, $y_i$, $y_{i-1}$, $k_i$, and $k_{i-1}$.
The global maximum of the first derivative of the entire cubic spline is the largest value found among all segments.




\section{Additional Details on Experimental Procedures}\label{app:settings}


In \autoref{tab:network-params}, we provide an overview of the architecture and key features of two neural network models: a Convolutional Neural Network (CNN) and a Time Series Transformer. 
The CNN comprises three convolutional layers with 32, 64, and 64 channels respectively, and two linear layers with 128 and 32 units. It includes max pooling with a kernel size of 4 and flattening operations. The Time Series Transformer, in contrast, does not have convolutional layers but includes two transformer layers and two linear layers, each with 32 units, along with a flattening step. Both models utilize ReLU activation functions in their convolutional and linear layers, and they both have a softmax output activation function.

\begin{table}[ht]
    \centering
    \begin{tabular}{lcc}
        \toprule
        \textbf{Parameter} & \textbf{CNN} & \textbf{Time Series Transformer} \\
        \midrule
        Number of Layers & 3 Conv + 2 Linear & 1 Transformer + 2 Linear \\
        Convolutional Layers & 3 (32, 64, 64 channels) & N/A \\
        Transformer Layers & N/A & 2 Layers \\
        Max Pooling & Yes (Kernel Size: 4) & N/A \\
        Linear Layers & 2 (128, 32 units) & 2 (32 units each) \\
        Activation Functions & ReLU & ReLU \\
        Output Activation & Softmax & Softmax \\
        \bottomrule
    \end{tabular}
    \caption{Network parameters of CNN and time series transformer networks.}
    \label{tab:network-params}
\end{table}

In training the respective networks, both the CNN and the transformer shared similar hyper-parameters. 
Both models were training for 200 epochs with a batch size of 1024 and implementing an early stopping mechanism with a patience of 100 epochs to prevent overfitting. 
We incorporate random data augmentation from \autoref{app:perturbations} with an intensity of $0.5$.
We utilize the Adam optimizer for both models and paired with a learning rate adjustment strategy that reduces the rate upon hitting a plateau. 





\section{Adversarial Attack Experiment Details}\label{app:adversarial_attack}


An adversarial attack refers to crafting input data with the intent of fooling a machine learning model into making a misclassification~\citep{szegedy2013intriguing, goodfellow2014explaining}.
Formally, $\tilde{x}$ is called an adversarial example of $x$ if $\argmax_{y \in \set{Y}} F_yd(\tilde{x}) \neq \argmax_{y \in \set{Y}} F_y(x)$ where $d(\tilde{x}, x) \leq \epsilon$, with $\epsilon > 0$. 
In practice, using a loss function $\set{L}$ as defined in \cite{carlini2017towards}: 
\begin{equation}
    \set{L}^{target}(\tilde{x}) =  \max_{\tilde{y} \in \set{Y}\setminus y} F_{\tilde{y}}(\tilde{x}) - F_y(\tilde{x}),
\end{equation}
the goal is to maximize this difference in order to make the model very confident about the wrong classification.

We report in \autoref{tab:ucr_adversarial_extended}, the clean top-1 accuracy, the adversarial accuracy, coverage and set-size for CP, RSCP and TSCP under an uniform distribution of 20 adversarial attacks within $\epsilon\in [0, 0.1]$.
In this comparison, we consider only the results were the classifier achieved a clean accuracy higher than 70\%.


TSCP consistently outshines CP and RSCP, particularly in maintaining higher adversarial accuracy and coverage, demonstrating its superior resilience to adversarial manipulations.
This robustness is evident despite the noticeable decline in performance all methods experience under adversarial conditions compared to their clean top-1 accuracy.
The performance of these methods, however, varies significantly across different datasets, underscoring the influence of dataset characteristics on model robustness. For instance, in datasets like \textit{ECG200} and \textit{Plane}, all methods maintain high adversarial accuracy, whereas in others like \textit{Meat} and \textit{Coffee}, there's a substantial performance drop, especially for CP and RSCP.
Furthermore, TSCP tends to generate more precise predictions, as indicated by its generally smaller set sizes compared to RSCP, while CP, though having the smallest set sizes, lags in adversarial accuracy.

\begin{table}[htb]
     \centering
     \caption{Comparison analysis of CP~\citep{vovk2005algorithmic}, RSCP~\citep{gendler2021adversarially} and TSCP (our) across UCR~\citep{UCRArchive} datasets. 
     We consider an average of 20 uniformly distributed PGD~\citep{carlini2017towards} attack samples with $\epsilon\in [0, 0.1]$ and a target coverage of 90\% ($\alpha = 0.1$).
     RSCP and TSCP are augmented by $\sigma = 0.2$. Continue in \autoref{tab:ucr_adversarial_extended_2}.
     }
     \label{tab:ucr_adversarial_extended}
     \vspace{-0.5em}
     \adjustbox{width=0.8\textwidth}{%
     \begin{tabular}{lr|rrr|rrr|rrr}
         \toprule
         \textbf{Dataset} &\textbf{Acc.} &\multicolumn{3}{c}{\textbf{Adversarial Acc.}} &\multicolumn{3}{c}{\textbf{Coverage}} &\multicolumn{3}{c}{\textbf{Set-Size}} \\
         &      &CP &RSCP &TSCP &CP &RSCP &TSCP &CP &RSCP &TSCP \\
         \midrule
         \midrule         
         ArrowHead &72.0 &57.8 &58.0 &60.7 &78.7 &97.5 &99.4 &0.83 &2.41 &2.53 \\
         BME &93.3 &90.4 &90.4 &90.4 &100.0 &100.0 &100.0 &0.47 &3.00 &3.00 \\
         Beef &70.0 &51.6 &50.0 &49.0 &71.6 &100.0 &100.0 &0.40 &4.57 &4.57 \\
         BirdChicken &75.0 &72.1 &72.1 &72.1 &78.2 &100.0 &100.0 &0.74 &2.00 &2.00 \\
         CBF &92.6 &91.3 &91.3 &91.3 &97.2 &100.0 &100.0 &2.14 &3.00 &3.00 \\
         Car &73.3 &59.5 &59.5 &59.5 &78.4 &100.0 &100.0 &0.95 &4.00 &4.00 \\
         Chinatown &87.5 &87.5 &87.5 &87.5 &92.2 &100.0 &100.0 &0.69 &2.00 &2.00 \\
         CinC-ECG-torso &73.9 &69.9 &69.4 &68.8 &88.6 &99.9 &98.2 &0.52 &3.98 &3.84 \\
         Coffee &96.4 &54.8 &55.4 &53.9 &56.4 &99.1 &53.9 &0.61 &1.94 &0.99 \\
         Cricket-X &70.5 &65.6 &66.8 &62.5 &85.4 &99.7 &100.0 &1.22 &10.54 &10.19 \\
         Cricket-Z &70.3 &65.2 &66.1 &62.5 &85.0 &99.4 &99.7 &1.94 &10.20 &9.70 \\
         DiatomSizeReduction &94.1 &68.3 &68.3 &33.1 &75.0 &99.9 &99.3 &0.69 &3.99 &3.94 \\
         DistalPhalanxOutlineAgeGroup &85.2 &79.1 &78.9 &81.0 &92.1 &100.0 &100.0 &1.57 &2.93 &2.96 \\
         DistalPhalanxOutlineCorrect &77.7 &66.2 &61.2 &46.5 &93.3 &99.8 &100.0 &1.72 &1.97 &1.97 \\
         DistalPhalanxTW &74.8 &72.4 &73.4 &76.2 &90.6 &100.0 &100.0 &2.17 &4.48 &4.35 \\
         ECG200 &89.0 &86.4 &86.7 &84.6 &92.6 &100.0 &100.0 &0.79 &1.98 &1.94 \\
         ECG5000 &93.6 &92.2 &91.4 &92.4 &96.0 &99.4 &99.8 &0.93 &4.01 &4.46 \\
         ECGFiveDays &80.8 &72.9 &72.3 &83.7 &78.8 &97.8 &93.0 &0.77 &1.59 &1.25 \\
         Earthquakes &73.3 &71.1 &69.8 &74.8 &82.9 &94.9 &92.8 &0.39 &1.64 &1.42 \\
         ElectricDevices &73.8 &64.8 &66.7 &64.6 &84.0 &98.1 &98.1 &1.63 &6.37 &6.44 \\
         FaceAll &72.7 &67.9 &67.9 &72.0 &90.7 &87.0 &88.5 &0.92 &8.82 &7.60 \\
         FaceFour &79.5 &69.4 &68.2 &81.1 &88.1 &95.9 &99.3 &1.69 &2.38 &2.30 \\
         FacesUCR &83.1 &73.2 &73.7 &81.3 &91.4 &99.5 &99.5 &1.04 &11.24 &11.64 \\
         Fish &86.9 &48.7 &48.7 &48.7 &63.8 &100.0 &100.0 &0.70 &7.00 &7.00 \\
         FordA &90.1 &70.9 &72.2 &78.3 &75.9 &95.8 &97.4 &0.83 &1.66 &1.66 \\
         FordB &85.2 &72.5 &72.0 &76.4 &78.0 &99.6 &98.2 &1.04 &1.73 &1.66 \\
         FreezerRegularTrain &96.1 &76.3 &76.3 &76.3 &72.9 &100.0 &100.0 &0.34 &2.00 &2.00 \\
         FreezerSmallTrain &73.3 &73.0 &73.0 &73.0 &89.3 &100.0 &100.0 &0.32 &2.00 &2.00 \\
         GunPointAgeSpan &87.3 &87.3 &87.3 &87.3 &99.3 &100.0 &100.0 &1.63 &2.00 &2.00 \\
         GunPointMaleVersusFemale &93.4 &93.1 &93.1 &93.1 &99.6 &100.0 &100.0 &1.11 &2.00 &2.00 \\
         GunPointOldVersusYoung &89.2 &89.0 &89.0 &89.0 &91.9 &100.0 &100.0 &0.91 &2.00 &2.00 \\
         Gun-Point &97.3 &71.1 &70.8 &82.4 &87.9 &100.0 &100.0 &0.48 &1.97 &1.96 \\
         HandOutlines &84.7 &59.5 &62.4 &78.3 &59.2 &98.7 &98.5 &0.89 &1.94 &1.92 \\
         HouseTwenty &72.3 &72.3 &72.3 &72.3 &79.7 &100.0 &100.0 &0.62 &2.00 &2.00 \\
         InsectEPGRegularTrain &100.0 &100.0 &100.0 &100.0 &100.0 &100.0 &100.0 &0.65 &3.00 &3.00 \\
         InsectEPGSmallTrain &100.0 &100.0 &100.0 &100.0 &100.0 &100.0 &100.0 &0.51 &3.00 &3.00 \\
         ItalyPowerDemand &95.8 &90.0 &90.0 &90.0 &98.1 &100.0 &100.0 &0.70 &2.00 &2.00 \\
         LargeKitchenAppliances &72.0 &60.6 &62.3 &66.2 &84.1 &98.1 &97.3 &1.64 &2.80 &2.70 \\
         Lighting2 &75.4 &72.8 &72.4 &72.3 &82.4 &100.0 &100.0 &0.48 &1.96 &1.91 \\
         Lighting7 &71.2 &65.6 &64.4 &60.0 &85.1 &100.0 &100.0 &1.50 &6.31 &5.41 \\
         MALLAT &94.7 &78.4 &79.8 &85.8 &78.7 &100.0 &100.0 &0.49 &6.81 &6.89 \\
         Meat &85.0 &20.2 &21.3 &26.5 &23.9 &62.0 &61.1 &0.84 &1.97 &1.91 \\
         MiddlePhalanxOutlineAgeGroup &79.8 &75.8 &75.4 &76.3 &95.1 &89.5 &94.2 &1.70 &1.45 &2.01 \\
         MiddlePhalanxOutlineCorrect &70.8 &58.6 &58.5 &58.6 &91.9 &100.0 &100.0 &1.72 &2.00 &2.00 \\
         MixedShapesRegularTrain &88.9 &83.3 &83.3 &83.3 &93.8 &100.0 &100.0 &1.08 &5.00 &5.00 \\
         MixedShapesSmallTrain &80.8 &76.3 &76.3 &76.3 &90.8 &100.0 &100.0 &1.32 &5.00 &5.00 \\
         MoteStrain &77.5 &77.5 &78.0 &79.2 &87.7 &98.2 &94.2 &0.49 &1.73 &1.35 \\
          &\vdots &\vdots &\vdots &\vdots &\vdots &\vdots &\vdots &\vdots &\vdots &\vdots \\
         \bottomrule
     \end{tabular}}
 \end{table}


 \begin{table}[htb]
     \centering
     \caption{Continuation of \autoref{tab:ucr_adversarial_extended}}
     \label{tab:ucr_adversarial_extended_2}
     \vspace{-0.5em}
     \adjustbox{width=0.8\textwidth}{%
     \begin{tabular}{lr|rrr|rrr|rrr}
         \toprule
         \textbf{Dataset} &\textbf{Acc.} &\multicolumn{3}{c}{\textbf{Adversarial Acc.}} &\multicolumn{3}{c}{\textbf{Coverage}} &\multicolumn{3}{c}{\textbf{Set-Size}} \\
         &      &CP &RSCP &TSCP &CP &RSCP &TSCP &CP &RSCP &TSCP \\
         \midrule
         \midrule
         &\vdots &\vdots &\vdots &\vdots &\vdots &\vdots &\vdots &\vdots &\vdots &\vdots  \\
         NonInvasiveFatalECG-Thorax1 &83.3 &33.9 &35.4 &38.5 &70.7 &100.0 &100.0 &3.21 &32.22 &33.91 \\
         NonInvasiveFatalECG-Thorax2 &89.5 &37.2 &38.7 &45.9 &65.1 &99.9 &100.0 &2.20 &27.47 &27.23 \\
         Plane &96.2 &95.6 &95.4 &91.0 &99.2 &100.0 &100.0 &1.06 &4.93 &4.78 \\
         PowerCons &98.3 &95.3 &95.3 &95.3 &100.0 &100.0 &100.0 &0.75 &2.00 &2.00 \\
         ProximalPhalanxOutlineAgeGroup &83.4 &73.8 &75.5 &81.0 &80.3 &100.0 &100.0 &1.15 &2.99 &2.99 \\
         ProximalPhalanxOutlineCorrect &75.9 &66.7 &74.8 &72.3 &91.2 &100.0 &100.0 &1.55 &1.98 &1.96 \\
         ProximalPhalanxTW &73.0 &70.8 &68.8 &65.3 &88.7 &100.0 &100.0 &2.11 &4.56 &5.25 \\
         Rock &74.0 &75.4 &75.4 &75.4 &100.0 &100.0 &100.0 &3.42 &4.00 &4.00 \\
         SemgHandGenderCh2 &86.8 &86.1 &86.1 &86.1 &93.4 &100.0 &100.0 &0.66 &2.00 &2.00 \\
         SemgHandSubjectCh2 &76.2 &75.8 &75.8 &75.8 &89.2 &100.0 &100.0 &1.30 &5.00 &5.00 \\
         SonyAIBORobotSurface1 &83.7 &81.5 &81.5 &81.5 &89.4 &100.0 &100.0 &0.73 &2.00 &2.00 \\
         SonyAIBORobotSurface2 &81.1 &82.0 &82.0 &82.0 &92.7 &100.0 &100.0 &0.68 &2.00 &2.00 \\
         StarLightCurves &92.0 &83.7 &84.1 &86.4 &95.8 &100.0 &100.0 &1.06 &2.88 &2.85 \\
         Strawberry &86.9 &53.5 &55.5 &64.4 &58.7 &100.0 &100.0 &1.11 &1.95 &1.91 \\
         SwedishLeaf &85.6 &59.8 &63.3 &57.6 &82.4 &100.0 &100.0 &1.64 &13.96 &13.66 \\
         Symbols &82.0 &76.7 &76.4 &83.2 &88.1 &99.9 &99.8 &0.95 &4.77 &4.04 \\
         ToeSegmentation1 &77.2 &75.7 &75.7 &77.2 &79.2 &99.4 &95.4 &0.89 &1.81 &1.64 \\
         ToeSegmentation2 &86.2 &84.0 &84.1 &87.3 &93.9 &98.7 &95.6 &0.66 &1.45 &1.38 \\
         Trace &99.0 &82.4 &81.9 &83.1 &90.1 &100.0 &100.0 &0.68 &3.14 &3.18 \\
         TwoLeadECG &89.5 &65.4 &66.0 &64.5 &75.0 &95.4 &99.7 &0.44 &1.81 &1.92 \\
         Two-Patterns &99.9 &99.9 &100.0 &99.9 &100.0 &100.0 &100.0 &0.78 &3.80 &3.77 \\
         UMD &91.7 &92.8 &92.8 &92.8 &99.7 &100.0 &100.0 &1.71 &3.00 &3.00 \\
         UWaveGestureLibraryAll &94.7 &87.9 &89.0 &93.0 &95.8 &99.9 &100.0 &0.80 &6.91 &7.37 \\
         synthetic-control &97.3 &95.5 &95.0 &97.6 &99.4 &100.0 &100.0 &0.90 &2.84 &2.78 \\
         uWaveGestureLibrary-X &80.2 &75.8 &75.1 &75.6 &91.3 &99.6 &99.6 &1.34 &5.47 &5.17 \\
         uWaveGestureLibrary-Z &70.3 &65.2 &65.3 &69.3 &85.8 &99.9 &99.9 &1.33 &7.06 &7.12 \\
         wafer &98.9 &98.7 &98.7 &97.2 &99.8 &100.0 &100.0 &0.88 &1.93 &1.94 \\
         yoga &75.3 &67.5 &67.3 &67.4 &72.4 &99.9 &99.8 &1.04 &1.98 &1.96 \\
         \midrule
         Overall &84.1 &74.1  &74.4  &\textbf{75.3}  &85.9  &98.7  &\textbf{98.0}  &\textbf{1.09}  &4.32  &4.28  \\
         \bottomrule
     \end{tabular}}
 \end{table}



 \section{Domain Generalization for Time Series Classification}

In \autoref{fig:magnitude_versus_time}, we display the individual results of the domain generalization experiment conducted in the in-house dataset. Each test set satisfies the condition of a minimum of 2\,000 samples.
The ability of the model to maintain high accuracy and coverage, along with a consistent set size across these different domains, is indicative of its robustness and effectiveness in handling unseen domains.
As we observe, the use of jitter transformation to augment the input signal shows reduced coverage performance in two additional domains compared to other transformations.



\begin{figure}[htb]
    \centering
    \begin{subfigure}[b]{0.48\textwidth}
        \centering
        \includegraphics[width=\textwidth]{figs/domain generalization/jitter_scaling_conformal_0.1.pdf}
        \caption{Jitter \& Scaling.}
    \end{subfigure}
    \hspace{-0.5em}
    \begin{subfigure}[b]{0.48\textwidth}
        \centering
        \includegraphics[width=\textwidth]{figs/domain generalization/warping_conformal_0.1.pdf}
        \caption{Time \& Magnitude warping.}
    \end{subfigure}
    \caption{Results of domain generalization for a binary time-series classifier applied to different recordings of vehicle sensor data.}
    \label{fig:magnitude_versus_time}
\end{figure}