\section{Certified Robustness For Temporal Transformations}\label{sec:tscp}

In this section, we introduce a temporal transformation for stretching and compressing time series. 
Building upon previous work~\citep{li2021tss}, we establish proven robustness guarantees.
We conclude by presenting our \textit{Temporal Smooth Conformal Predictor} (TSCP).


\subsection{Random Time Warping}

In practice, the time warping transformation $\phi$ is centered around a randomly chosen warp point $p\in\mathbb{N}$ with $0 < p < T$, and involves stretching and compressing different sections of $x$ while preserving its overall length.
The time warping process is characterized by two key parameters: $w_l, w_r\in \mathbb{N}$, representing the warp factors for the left and right sides of $p$, respectively. 
The warp factor $w_l$ is selected randomly from a uniform distribution in the range $(0, \ceil{\theta\cdot T})$, where $0 < \theta < 1, \theta \in \R$ denotes the warp size. 
The warp factor $w_r$ is then calculated to ensure a balanced warp, maintaining the length of $T$. 
Formally, this relationship is expressed as:
%
\begin{equation}
    w_r = w_l \cdot \frac{p}{T - p}.
\end{equation}
%
For each index $i$ in the original time series $x$, the corresponding index $\tilde{i}$ in the warped time series $\tilde{x}$ is determined based on $w_l$, $w_r$, and $p$. 
Specifically, the warped indices are computed as follows:
%
\begin{equation}
  \tilde{i} = \left\{\begin{array}{cc}
    i + \ceil{w_l\cdot \frac{p - i}{p}} &\text{for}\; i < p, \\
    i - \ceil{w_r\cdot \frac{i - p}{T - p}} &\text{for}\; i \geq p.
  \end{array}\right.
\end{equation}
%
Finally, the warped time series $\tilde{x}$ is constructed by mapping each value $t_i$ from the original time series $x$ to the corresponding warped index $\tilde{i}$.
In Alg.~\ref{alg:time-warp}, we synthesize the overall procedure.


\begin{algorithm}
\caption{Random Time Warping of a Time Series}\label{alg:time-warp}
\begin{algorithmic}[1]
\Procedure{RandTimeWarp}{$x, \theta$}
    \State \textbf{initialize:}$p \sim \set{U}[1, T-1]$, $w_l \sim \set{U}[1, \ceil{\theta\cdot T-1}]$
    \State  $\tilde{x} \gets x$;  $w_r \gets w_l \cdot \frac{p}{T - p}$
    \For{$i = 0$ \textbf{to} $T-1$}
        \If{$i < p$}
            \State $\tilde{i} \gets i + \ceil{w_l \cdot \frac{p - i}{p}}$
        \Else
            \State $\tilde{i} \gets i - \ceil{w_r \cdot \frac{i - p}{T - p}}$
        \EndIf
        \State $\tilde{x}_{\tilde{i}} \gets x_i$
    \EndFor
    \State \Return $\tilde{x}$
\EndProcedure
\end{algorithmic}
\end{algorithm}

\begin{table*}[htb]
    \centering
    \caption{Certified robustness radii for resolvable and differentially resolvable time series transformations.}
    \label{tab:perturbations}
    \begin{tabular}{llll}
        \toprule
         Type       &Transformation ($\pi$) &Distribution  &Certified Robustness Radius ($R_\pi$) \\
        \midrule
         Resolvable         &Jitter &$\delta \sim \set{N}(0, \sigma^2 I)$ &$\frac{\sigma}{2}\left( \Phi^{-1}(  {p_A}) - \Phi^{-1}(  {p_B}) \right)$  \\
                            &Scaling &$\delta \sim \set{N}(1, \sigma^2 I)$ &$\frac{1}{2}\left( \Phi^{-1}(  {p_A}) - \Phi^{-1}(  {p_B}) \right)$ \\
         Diff. Resolvable   &Magnitude-warp &$u \sim \set{N}(1, \sigma^2I)$ &$\frac{\sigma}{2} \min_{1 \leq j \leq N}\left( \Phi^{-1}(  {p_A}^{(j)}) - \Phi^{-1}(  {p_B}^{(j)}) \right)$ \\
                            &Time warp &$p \sim \set{U}[0, T]$ &$\frac{\sigma}{2} \min_{1 \leq j \leq N}\left( \Phi^{-1}(  {p_A}^{(j)}) - \Phi^{-1}(  {p_B}^{(j)}) \right)$ \\
                            &Window-warp &$p \sim \set{U}[0, T]$ &$\frac{\sigma}{2} \min_{1 \leq j \leq N}\left( \Phi^{-1}(  {p_A}^{(j)}) - \Phi^{-1}(  {p_B}^{(j)}) \right)$ \\
        \bottomrule
    \end{tabular}
\end{table*}


\paragraph{Numerical Complexity}

Here, we discuss the computational complexity of our time warping augmentation method in relation to earlier studies~\citep{le2016data, um2017data, iwana2021time}. 
Traditionally, time warping involves creating a cubic spline using a series of knots, a process that typically requires solving a tridiagonal system of equations. Once constructed, this spline is applied across the time series.
The complexity of this method is primarily dictated by the number of knots, $I$, and the time series length, $T$, resulting in an overall linear complexity of $\set{O}(I + T)$.
In contrast, our proposed method adopts a more straightforward approach. 
It primarily consists of a loop that runs through the time series, executing one simple arithmetic operation for each element. 
This results in a linear complexity of $\set{O}(T)$, making it $I$-times more efficient, especially when the number of knots $I$ in the cubic spline method is significantly large. 


\subsection{Robustness Radii}\label{sec:robustness_radii}



Similarly to \citet{li2021tss}, we categorize the transformations into two types: \textit{resolvable} and \textit{differentiably resolvable}.
In \autoref{tab:perturbations}, we report the certified radius for each individual transformation considered. 

As previously discussed in \autoref{sec:theorem}, we consider the methods for computing a tight and scalable upper bound $M$ for the interpolation error $M_{\set{Z}_\lambda}$ in resolvable and differentially resolvable time series transformations. 
The process begins by selecting a subset of transformation parameters $\{\lambda\}^N_{j=1}$ from $\set{Z}_\lambda$, and applying these parameters to transform the input, resulting in a set of transformed inputs $\{\phi(x, \lambda_j)\}_{j=1}^N$. 
Subsequently, the class probabilities for each of these transformed inputs are calculated using a classifier that has been smoothed with the transformation $\psi$. 
The underlying principle is that if each parameter $\lambda_j$ in $\set{Z}_\lambda$ is sufficiently close to one of the sampled parameters, then the classifier can be considered robust against any parameters from the set $\set{Z}_\lambda$. 
This forms a crucial part of the methodology for ensuring both the accuracy and scalability of the upper bound $M$ in relation to the certification of transformations, particularly those involving interpolation errors.



\paragraph{Jitter} 
This method aligns with the application and bounds derivation associated with smooth classifiers as formerly described in \cite{cohen2019certified, salman2019provably}. 
The convolution of a Gaussian process with the input signal, formerly recognized as the Weierstrass transform~\citep{bilodeau1962weierstrass}, provides an alternative yet equivalent perspective on the certified robustness assurances for predictions~\citep{salman2019provably}.

\paragraph{Scaling}
As one might expect, scaling a time series is quite like adjusting the contrast in an image. 
To determine a guaranteed robustness radius from this, we can calculate the probability of the leading predicted class, denoted as $p_A$, and the next closest class, $p_B$, using Monte-Carlo sampling (refer to Corollary 7; Appendix D in \citet{li2021tss}). 
The robustness radius is then determined by taking half the difference between the quantiles of these two probabilities.


\paragraph{Magnitude \& window warping}

In the context of magnitude and window warping computing an upper bound on the interpolation error is related to find the maximum value of the derivative of the cubic spline interpolation.
In general, we can calculate an upper bound for interpolation error in transformations, using stratified sampling~\cite{li2021tss}. 
An interval of transformation parameters, $\set{Z}_\lambda = [a, b]$, is divided uniformly into $N$ parameters, $\lambda_i$. 
For these parameters, functions $g_i: [a,b]\to \R_{\geq 0}$, representing squared $\ell_2$ interpolation error between transformed samples, are defined as:
\begin{equation}
    \lambda \to g_i (\lambda) \eqdef \norm{\phi(x, \lambda) - \phi(x, \lambda_i)}_2^2.    
\end{equation}
The goal is to find an upper bound, $M_i$, for each sub-interval $[\lambda_i, \lambda_{i+1}]$ such that: 
\begin{equation}
    M_i \geq \max_{\lambda_i \leq \lambda \leq \lambda_{i+1}}\, \min\{g_i(\lambda), g_{i+1}(\lambda)\}.
\end{equation}
This leads to an overall upper bound $\sqrt{M}\eqdef \max_{1\leq i \leq N-1} \sqrt{M}_i$, which is valid for the entire interval $\set{Z}_{\lambda}$.

Second-level sampling (n) is conducted within each sub-interval $[\lambda_i, \lambda_{i+1}]$, dividing them uniformly into n parameters, $\{\gamma_{i,j}\}_{j=1}^n$.
If we have that $L$ is a global Lipschitz constant for all functions $\{g_i\}_{i=1}^N$, a closed-form expression for $M_i$ can be derived.
In \autoref{app:lipschitz_constant}, we compute the global derivative and bound it by a Lipschitz constant for a cubic spline interpolation.
With a global Lipschitz constant $L$ for all $g_j$ functions, a closed-form expression for $M_j$ can be derived~\citep{li2021tss}. 
This methodology shows that increasing the number of first-level ($N$) or second-level ($n$) samples results in a tighter upper bound on interpolation error. 




\paragraph{Time warping}
In the context of time warping, $\phi$ alters the indices of the time series based on the parameters $w_l$ and $w_r$, with the transformation centered around the point $p$. 
The derivative of the warping function $\phi$ essentially represents the rate of change of the warped indices with respect to the original indices. 
Formally,
%
\begin{equation}
    d\phi (i) = \left\{ \begin{array}{cc}
         1 - \frac{w_l}{p} &\text{for}\; i < p,  \\
         1 + \frac{w_r}{T - p} &\text{for}\; i \geq p.
    \end{array}\right.
\end{equation}
%
Since $w_l \leq p \leq T$, the derivatives are always positive. Given that $w_l$ is selected in the range $(0, \lceil{\theta \cdot T}\rceil)$ and $w_r = w_l \cdot \frac{p}{T - p}$, we can compute the upper bounds for both derivatives.



\subsection{Temporal Smooth Conformal Predictor}


\begin{algorithm}
    \caption{TSCP: Temporal Smooth Conformal Predictor}\label{alg:tscp}
    \begin{algorithmic}[1]
    \Require target error rate $\alpha \in (0, 1)$, transformation $\phi$, budget $\sigma$, smoothing samples $N$, data split into training $\set{D}_{tr}$ and calibration $\set{D}_{cal}$ sets.
    % \Ensure $y = x^n$
    \State Train a classifier $F$ on $\set{D}_{tr}$.
    \State Compute generalized smoothed scores $\{S_\phi^{(i)}\}_{i\in \set{D}_{cal}}$.
    \State Compute the empirical quantile $Q_{1-\alpha}(\{S_\phi^{(i)}\}_{i\in \set{D}_{cal}})$.
    \State Given $\tilde{x}^{(i+1)}$, construct $\set{C}_{\pi} (\tilde{x}^{(n+1)})$  as in \autoref{eq:smooth_conformal_set}.
    \end{algorithmic}
\end{algorithm}

In Alg.~\ref{alg:tscp} we present our method. 
It is primarly designed to generate reliable predictions within a defined error range $\alpha$. 
It operates by considering a transformation function $\phi$, a budget constraints $\sigma$, and a number of smoothing samples $N$. 
The algorithm calculates generalized smoothed scores for the calibration dataset and determines the empirical quantile from these scores, aligning with the target error rate.
The final step involves constructing a conformal prediction set for any new input, ensuring that the predictions adhere to the set error rate and maintain the required level of reliability.