\section{Generalized Robustness via Conformalized Randomized Smoothing}\label{sec:method}

This section introduces the methods and ideas related to smooth conformal predictions for robustness certification. 
For ease of explanation, in this section, we will regard the input $x \in \R^{T}$ as one-variate signal.


\subsection{Generalized Smoothed Classifier}\label{sec:temporal_smooth_classifier}

Following previous works on image classifiers~\citep{li2021tss, hao2022gsmooth}, we introduce a smooth classifier by randomly transforming inputs with parameters sampled from a smoothing distribution.
An important aspect is that even if the definition is general and applies towards any transformation, our focus is on time series augmentations.

Let us consider a transformation $\phi : \set{X}\times \set{Z} \to \set{X}$ which produces a unique augmented version of the time series $x$, leading to a distinct $\tilde{x}$. 
In this notation, $\set{Z}$ represents the set of parameters.
In \autoref{app:perturbations}, we define the set of time series transformations $\phi$ considered in this work.


\begin{definition}[Generalized Smoothed Classifier]\label{def:smoothed_classfier}
    Let $\phi : \set{X}\times \set{Z} \to \set{X}$ be a transformation, $\pi \sim \set{D}_\pi$ a random variable taking values in $\set{Z}$ and let $F \vcentcolon \mathbb{R}^T \to \mathbb{R}$ be a soft classifier.
    We define the $\phi$-smoothed version $G_\phi : \set{X} \to \prob(\set{Y})$ of $F$ as:
    \begin{equation}
        G_{\phi}(x) \eqdef \E_{\pi \sim \set{D}_\pi} \left[ F(\phi (x, \pi) )\right].
    \end{equation}
\end{definition}

Drawing from Theorem 1 in \cite{li2021tss}, it is possible to establish a robustness certificate for the classifier $G_{\pi}$ that employs a $\phi$-smoothing technique. 
In \autoref{sec:robustness_radii}, we discuss the robustness guarantees for a specific set of time series transformations.
In general, take an input $x \in \set{X}$ and a random variable $\pi \in \set{Z}$. 
The soft classifier $F$ assesses that $\tilde{x} = \phi(x, \pi)$ is likely to be in class $y_A$ with a probability of at least $p_A$, and the likelihood of it being in the second most probable class does not exceed $p_B$. 
To establish a robustness certificate, one must identify a set of perturbation parameters $\set{Z}_{\lambda} \subseteq \set{Z}$ and to ensure that for all perturbations $\lambda \in \set{Z}_{\lambda}$, the classifier $G_{\phi}$'s output for $\phi(x, \lambda)$ remains identical to its output for $x$, i.e. $G_\phi(\phi(x, \lambda)) = G_\phi(x)$. 


Lastly, we establish a $\phi$-smoothed conformal score for $G_\phi$. 
Unlike the approach in \cite{gendler2021adversarially}, we incorporate a broader range of transformations.

\begin{definition}[Generalized Smoothed Score]\label{def:smoothed_score}
    Let $\phi : \set{X}\times \set{Z} \to \set{X}$ be a transformation, $\pi \sim \set{D}_\pi$ a random variable taking values in $\set{Z}$ and $S:\set{X}\times\set{Y} \to \mathbb{R}_{\geq 0}$ a scoring function. 
    We define the $\phi$-smoothed score function as:
    \begin{equation}
        S_{\phi}(x, y) \eqdef Q \left(\E_{\pi \sim \set{D}_\pi} \left[ S(\phi (x, \pi), y )\right]\right),
    \end{equation}
    where $Q:[0,1]\to \R$ represents the quantile function.
\end{definition}

% The idea is to identify a constant value $R \geq 0$, such that 


\subsection{Robustness Guarantees for Conformal Predictions Under General Transformations}\label{sec:theorem}

In the context of domain generalization, where the assumption of i.i.d. data no longer applies, it becomes crucial to estimate the potential shift between a baseline smooth score and one that comes from a different domain or has been attacked. 
This estimation is necessary to effectively bound the \textit{distribution shift}.
This approach extend the setting of \cite{gendler2021adversarially}, to a broader range of input transformations.
We approach this by considering a non-conformity score function $S_\phi$ as defined in \autoref{def:smoothed_score}, which allows us to gauge the extent of change brought on by a transformation function to $x^{(n+1)}$. 
Our task is to ensure that $S_\phi$ complies with the condition:
%
\begin{equation}\label{eq:inequality_conformity_score}
    S_\phi(\tilde{x}^{(n+1)}, y) \leq S_\phi( x^{(n+1)}, y) + R_\pi, \; \forall y \in \set{Y},    
\end{equation}
%
where $\tilde{x}^{(n+1)} = \phi(x^{(n+1)}, \pi)$ and $R_\pi$ is a constant connected to $\pi$, fulfilling the criteria that $R_{\pi_1} \leq R_{\pi_2}$ if $\pi_1 \leq \pi_2$, and $R_\pi$ is zero when $\pi$ is zero. 


The exact derivation of the robustness radius $R_\pi$ depends on the transformation considered.
Strictly speaking, our objective is to verify the robustness in response to a transformation $\phi$ that can be effectively addressed by $\psi$, and this verification pertains to transformation parameters contained within the set $\set{Z}_\lambda \subseteq \set{Z}$.
To achieve this, we begin by selecting a set of parameters $\{ \lambda_j \}_{j =1}^{N}$ from the parameter space $\set{Z}_\lambda$. 
We then apply these parameters to transform the input data, generating a collection of transformed inputs $\{\phi (x, \lambda_j)\}_{j =1}^{N}$. 
Next, we utilize the classifier (which has been enhanced with the transformable transformation $\psi$) to calculate the class probabilities for each of these transformed inputs.
Following \citet[Corollary 2]{li2021tss}, if the guaranteed robustness radius $R_\pi$, defined as: 
\begin{equation}
    R_\pi\; \eqdef\; \frac{\sigma}{2} \min_{1\leq j \leq N} \left( \Phi^{-1} (p_A^{(j)}) - \Phi^{-1}(p_B^{(j)})  \right)
\end{equation}
for differentially resolvable transformations is greater than the maximum interpolation error:
\begin{equation}\label{eq:max_interp_error}
    M_{\set{Z}_\lambda} = \max_{\lambda \in \set{Z}_\lambda} \min_{1 \leq j \leq N} \norm{\phi(x, \lambda) -\phi(x, \lambda_j)}_2\; <\; R_\pi 
\end{equation}
then the it is guaranteed that $\forall\, \lambda \in \set{Z}_\lambda$, the smooth classifier will continue classify the original predicted class.
Practically, given a transformation $\phi$, if the conditions identified in \autoref{tab:perturbations} are satisfied, $S_\phi$ provides a tight certified distance $R_\pi$ that satisfies \autoref{eq:inequality_conformity_score}.

In this context, $R_\pi$ is instrumental in linking the observed score $S_\phi(\tilde{x}^{(n+1)}, y)$ with the unobserved score $S_\phi(x^{(n+1)}, y)$ for any given $y \in \set{Y}$. 
Leveraging this relationship, we construct a prediction set $\set{C}_{\pi} (\tilde{x}^{(n+1)})$ resilient to input transformations with bounded deviation, following a decision rule:
%
\begin{equation}\label{eq:smooth_conformal_set}
    \left\{ {y \in \set{Y} : S_\phi( \tilde{x}^{(n+1)}, y) \leq Q_{1 - \alpha} (\{S_\phi^{(i)}\}_{i\in \set{D}_{cal}}) + R_\pi} \right\},
\end{equation}
%
where $S_\phi^{(i)}$ is defined as $S_\phi(x^{(i)}, y^{(i)})$. 
This approach diverges from the standard split conformal method of \autoref{eq:conformal_set}, as our prediction set is derived by comparing the test score against an elevated threshold $Q_{1-\alpha} + R_\pi$.
This adjustment is dependent on both the magnitude of the transformation and the robustness of $S_\phi$, implying that a larger disturbance necessitates a higher threshold increase, while a more resilient $S_\phi$ requires a smaller increase. 

\begin{restatable}{theorem}{smooth}\label{th:smooth_coverage}
    Assume a set of samples $\{(x^{(i)}, y^{(i)})\}_{i=1}^{n+1}$ that are exchangeably drawn from an unknown distribution $\set{D}_{xy}$.
    Let $\phi: \set{X}\times \set{Z} \to \set{X}$ be a differentially resolvable transformation, let $\set{Z}_\lambda \subseteq \set{Z}$, $\{ \lambda_j \}_{j =1}^{N}$ be a set of perturbation parameters and let $G:\set{X} \to \prob(\set{Y})$ be a smooth classifier as in \autoref{def:smooth_classifier} that predicts $y_A \in \set{Y}$ given $x$ (i.e. $G(y_A\,|\,x)$ where $x = x^{(n+1)})$.
    If for any $j$, $G(x)$ has class probabilities that satisfy:
    \begin{equation}
        G (y_A\,|\, \phi (x, \lambda_j))  \geq p_A^{(j)} \geq p_B^{(j)} \geq \max_{y \neq y_A} G (y\, |\,  \phi (x, \lambda_j)),
    \end{equation}
    and \autoref{eq:max_interp_error} holds, then, the prediction set $\set{C}_\pi$ as defined in \autoref{eq:smooth_conformal_set} will satisfy the following probability:
    \begin{equation}
        \prob[y^{(n+1)} \in \set{C}_\pi (\phi(x^{(n+1)}, \pi))] \geq 1 - \alpha.
    \end{equation}
\end{restatable}


Proof is given in \autoref{app:proof_smooth_conformal_set}. 
Thus, we assert that the prediction set  $\set{C}_{\pi} (\tilde{x}^{(n+1)})$ will include the unknown target label $y^{(n+1)}$ with a probability of at least $1-\alpha$, regardless of the distribution $\set{D}_{xy}$, sample size $n$, the score function $S_\pi$ adhering to \autoref{eq:inequality_conformity_score}, and the magnitude of adversarial perturbation $\pi$ generated by any attack algorithm.

\subsection{Bounding The Domain Generalization}

In this section, we broaden our examination to include the PAC theory and sketch guarantees for the \textit{Generalized Smoothed Classifier} to comply with the PAC criteria outlined in \autoref{eq:pac_set}.


Following \cite{park2020pac, park2022pac}, the goal is to find an upper bound $\bar{\xi}(k;m,\gamma )\in [0, 1]$ on the true success probability $\mu$, constructed from a sample $k \sim \text{Binom}(m, \mu)$, which holds with probability at least $1-\gamma$, where the probability mass function is defined as:
%
\begin{equation}
    \prob_{B} (k\,|\, m, \xi) = \sum_{i = 0}^{k} \begin{pmatrix}
        m \\ k
    \end{pmatrix} \xi^i (1 - \xi)^{m-i}.
\end{equation}
%
The PAC guarantees is expressed as:
\begin{equation}
    \prob_{k \sim \text{Binom}(m, \mu)}[ \mu \leq \bar{\xi}(k\,|\, m, \gamma)] \geq 1 - \gamma,
\end{equation}
%
where the upper bound $\bar{\xi}$ is defined as:
%
\begin{equation}
    \Bar{\xi} (k\,|\, m, \gamma) \eqdef \inf_{\xi \in [0, 1]} \left\{\xi\; :\; \prob_{B}(k\,|\, m, \xi) \leq \gamma \right\} \cup \{1\}.
\end{equation}
%
In other words, the true error $L_{\set{D}_{cal}}(\set{C})$ is bounded by the upper bound $\bar{\xi} (\hat{L}_{\set{D}_{cal}}(\set{C})\,|\, m, \gamma)$
with probability at least $1-\gamma$.

In our analysis, we consider the conformal set $\set{C}_\pi$ of \autoref{eq:smooth_conformal_set} and bound the generalization error by adjusting the estimated threshold $\hat{\tau}$, defined in \autoref{eq:pac_tau}, by the robustness $R_\pi$ radius of the smooth classifier.
To do so, we consider a mapping function $\psi_\pi: \set{X}\times\set{Y} \to \mathbb{R}$ that incorporates the score function $S_\phi$ and the robustness radius $R_\pi$, and encodes the prediction set condition into a binary classification framework:
%
\begin{equation}
    \psi_\pi(\tilde{x}, y) \eqdef S_\phi(\tilde{x}, y) - Q_{1 - \alpha} (\{S_\phi^{(i)}\}_{i\in \mathcal{D}_{cal}}) - R_\pi,
\end{equation}
where $\tilde{x} = \tilde{x}^{(n+1)} = \phi (x^{(n+1)}, y)$.
Thus, let us define a binary function $M_\tau(t) = \mathbb{I}[t \leq 0]$, such that we can re-write the confidence set $\set{C}_\tau$ as: 
%
\begin{equation}    
    \set{C}_{\pi, \tau}(\tilde{x}^{(n+1)}) = \left\{ y \in \mathcal{Y} : M_\tau(\psi(\tilde{x}^{(n+1)}, y)) = 1 \right\}.
\end{equation}
%
Thus, the PAC bound for the binary classifier $M_\tau$ will then imply a PAC bound for the confidence set predictor $\set{C}_{\pi, \tau}$, ensuring that the prediction set adheres to the desired probability bounds.
This means we need to establish a PAC bound for $M_\tau$ under the modified encoding that incorporates $S_\phi$, $Q_{1 - \alpha}$, and $R_\pi$.
In practice, we can obtain an empirical threshold $\hat{\tau}_\pi$ defined as:
%
\begin{equation}\label{eq:smooth_pac_tau}
    \hat{\tau}_\pi = \sup_{\tau \in \set{T}} \left\{ \tau \,:\, \hat{L}_{\set{D}_{cal}}(\set{C_{\pi, \tau}}) \leq k(m, \xi, \gamma) \right\},
\end{equation}
%
which depends on the distribution $\pi$ and on the confidence level $k$ defined in \autoref{eq:pac_alpha}.


