\label{section:error}

As mentioned in Section~\ref{subsect:Prob def}, the ResNet model in~\eqref{eq:ResNet} can be seen as the Euler discretization of the neural ODE~\eqref{eq:nODE} evaluated at continuous depth $t=1$:
\begin{equation}
\label{eq:error}
x(1)=\Phi(1,u)\approx u + f(u)=y.
\end{equation}

Our initial goal, related to Problem~\ref{def:error bound}, is to evaluate this approximation error for a given set of inputs $u\in\mathcal{X}_{in}$.
This is done below through the use of a Taylor expansion and its Lagrange-remainder form, combined later with some tools dedicated for reachability analysis.


\subsection{Lagrange remainder}
\label{subsect:lagrange}

The Taylor expansion of the state trajectory $x(t)$ of the neural ODE~\eqref{eq:nODE} at $t=0$ is given by the infinite sum:
\begin{equation}
x(t)=x(0)+t\frac{dx(0)}{dt}+\frac{t^2}{2!}\frac{d^2x(0)}{dt^2}+\frac{t^3}{3!}\frac{d^3x(0)}{dt^3}+\dots
\label{eq:taylor}
\end{equation}

The Lagrange remainder theorem offers the possibility to truncate~\eqref{eq:taylor} without approximation error, hence preserving the above equality.
We only state below the result in the case of a truncation at the Taylor order $2$ corresponding to the case of interest in our work.

\begin{proposition}[Lagrange remainder~\cite{rudin1976principles}]
\label{prop:lagrange}
There exists $t^*\in[0,t]$ such that
\begin{equation}
x(t)=x(0)+t\frac{dx(0)}{dt}+\frac{t^2}{2!}\frac{d^2x(t^*)}{dt^2}
\label{eq:lagrange}
\end{equation}
\end{proposition}

Notice that in~\eqref{eq:lagrange}, the second order derivative $\frac{d^2x}{dt^2}$ is evaluated at $t^*\in[0,t]$ instead of $t$ as in the Taylor series~\eqref{eq:taylor}.
Although the truncation in Proposition~\ref{prop:lagrange} provides a much more manageable expression than the infinite sum in~\eqref{eq:taylor}, the main difficulty is that this result only states the existence of a $t^*\in[0,t]$ satisfying the equality in~\eqref{eq:lagrange}, but its actual value is unknown.


\subsection{Error function}
\label{subsect:lagrange evaluation}

To compare the continuous state $x(t)$ with the discrete output of the ResNet, the state of the neural ODE~\eqref{eq:nODE} should be evaluated at depth $t=1$.

The first term of the right-hand side in~\eqref{eq:lagrange} is the known initial condition of the neural ODE~\eqref{eq:nODE}: $x(0)=u$.

The second term is provided by the definition of the vector field of the neural ODE~\eqref{eq:nODE}, and thus reduces to:
$$t\frac{dx(0)}{dt}=1\cdot f(x(0))=f(u).$$

The second derivative appearing in the third term of~\eqref{eq:lagrange} can be computed using the chain rule as follows:

\begin{align*}
\frac{d^2x(t)}{dt^2} 
&= \frac{df(x(t))}{dt}\\
&= \frac{\partial f(x(t))}{\partial t}+\frac{\partial f(x(t))}{\partial x}\frac{dx(t)}{dt}\\
&= \frac{\partial f(x(t))}{\partial t}+f'(x(t))f(x(t)).
\end{align*}


In our context of Section~\ref{section:preliminaries}, the function $f$ is assumed not to be explicitly dependent on the depth $t$ due to its definition as a single residual block with classical layers.
Therefore, the partial derivative $\frac{\partial f(x(t))}{\partial t}$ is equal to $0$, and the third term of~\eqref{eq:lagrange} thus reduces to:
$$\frac{t^2}{2!}\frac{d^2x(t^*)}{dt^2}=\frac{1}{2}f'(x(t^*))f(x(t^*)).$$

We can thus re-write~\eqref{eq:lagrange} as an equation defining the output of the neural ODE based on the output of the ResNet (for the same initial state/input $u$) and an error term:
\begin{equation}
    \Phi(1,u)= (u + f(u))+\varepsilon(u),
\label{eq:nODE~ResNet}
\end{equation}
where the approximation error between our models for this particular input $u$ is expressed by the Lagrange remainder of Taylor order 2:
\begin{equation}
    \varepsilon(u)=\frac{1}{2}f'(x(t^*))f(x(t^*)),
\label{eq:error function}
\end{equation}
with $x(t^*)=\Phi(t^*,u)$ for a fixed but unknown $t^*\in[0,1]$.

Equation~\eqref{eq:nODE~ResNet} can also be modified to rather express the outputs of the ResNet based on those of the neural ODE:
\begin{equation}
    u + f(u)=\Phi(1,u)-\varepsilon(u).
\label{eq:ResNet~nODE}
\end{equation}

The error function $\varepsilon:\R^n\rightarrow\R^n$ appearing positively in~\eqref{eq:nODE~ResNet} and negatively in~\eqref{eq:ResNet~nODE} is defined in~\eqref{eq:error function} only for a specific input $u$.
However, in the context of our Problem~\ref{def:error bound}, we are interested in analyzing the approximation error between both models over an input set $\mathcal{X}_{in} \subseteq \mathbb{R}^n$.
In addition, since the specific value of $t^*$ is unknown, we need to bound~\eqref{eq:error function} for any possible value of $t^* \in [0,1]$.
Therefore in the next sections, we focus on converting the equalities~\eqref{eq:nODE~ResNet}-\eqref{eq:ResNet~nODE} to set inclusions over all $u\in\mathcal{X}_{in}$ and $t^* \in [0,1]$.


\subsection{Bounding the error set}
\label{subsect:bounding error}

The reachable error set $\mathcal{R}_\varepsilon(\mathcal{X}_{in})$ introduced in Problem~\ref{def:error bound}, can be redefined based on the error function~\eqref{eq:error function} as follows:
\begin{align}
\mathcal{R}_\varepsilon(\mathcal{X}_{in})&=\left\{\Phi(1, u)-(u+f(u))~|~u\in\mathcal{X}_{in}\right\}\nonumber\\
&=\left\{\left.\frac{1}{2}f'(\Phi(t^*,u))f(\Phi(t^*,u))~\right|~t^* \in [0,1],~u\in\mathcal{X}_{in}\right\}.
\label{eq:error set 1}
\end{align}
To solve Problem~\ref{def:error bound}, our objective is thus to compute an over-approximation $\Omega_\varepsilon(\mathcal{X}_{in})$ bounding the error set: $\mathcal{R}_\varepsilon(\mathcal{X}_{in})\subseteq\Omega_\varepsilon(\mathcal{X}_{in})$.

The first step (corresponding to line 1 in Algorithm~\ref{alg:nODE}) is to compute the reachable tube of all possible states that can be reached by the neural ODE~\eqref{eq:nODE} over the whole range $t\in[0,1]$ and for any initial state $x(0)=u\in\mathcal{X}_{in}$.
This reachable tube can be defined similarly to $\mathcal{R}_{\text{neural ODE}}(\mathcal{X}_{in})$ in Section~\ref{subsect:Prob def} but for all possible depth $t\in[0,1]$ instead of only the final one:
\begin{equation*}
\label{eq:reach tube}
\mathcal{R}^{\text{tube}}_{\text{neural ODE}}(\mathcal{X}_{in})=\{\Phi(t,u)\in \mathbb{R}^n \mid t\in[0,1],~u\in \mathcal{X}_{in}\}.
\end{equation*}
Since in most cases this set cannot be computed exactly, we instead use off-the-shelf reachability analysis toolboxes to compute an over-approximating set $\Omega^{\text{tube}}_{\text{neural ODE}}(\mathcal{X}_{in})$ such that $\mathcal{R}^{\text{tube}}_{\text{neural ODE}}(\mathcal{X}_{in})\subseteq\Omega^{\text{tube}}_{\text{neural ODE}}(\mathcal{X}_{in})$.

The error set can then be re-written based on the above reachable tube definition, by replacing $\Phi(t^*,u)$ (with $t^* \in [0,1]$ and $u \in \mathcal{X}_{in}$) in~\eqref{eq:error set 1} by $x\in\mathcal{R}^{\text{tube}}_{\text{neural ODE}}(\mathcal{X}_{in})$.


\begin{align}
\mathcal{R}_\varepsilon(\mathcal{X}_{in})
&=\left\{\left.\frac{1}{2}f'(x)f(x)~\right|~x\in\mathcal{R}^{\text{tube}}_{\text{neural ODE}}(\mathcal{X}_{in})\right\}\nonumber\\
&\subseteq\left\{\left.\frac{1}{2}f'(x)f(x)~\right|~x\in\Omega^{\text{tube}}_{\text{neural ODE}}(\mathcal{X}_{in})\right\}.
\label{eq:error set 2}
\end{align}

The next step, in line 2 of Algorithm~\ref{alg:nODE}, is to over-approximate this error set $\mathcal{R}_\varepsilon(\mathcal{X}_{in})$.
One possible approach to achieve this is to define the static function $\varepsilon=\frac{1}{2}f'(x)f(x)$ and apply to it some set-propagation techniques (such as interval arithmetic~\cite{jaulin2001interval}, Taylor models~\cite{makino2003taylor}, or affine arithmetic~\cite{de2004affine}) to bound the set of output errors $\varepsilon$ corresponding to any state $x\in\Omega^{\text{tube}}_{\text{neural ODE}}(\mathcal{X}_{in})$ in the reachable tube over-approximation.
An alternative approach, which provided a tighter error bounding set in the particular case of the numerical example presented in Section~\ref{section:expirements}, is to define the discrete-time nonlinear system $x^+=\frac{1}{2}f'(x)f(x)$, and then use existing reachability analysis toolboxes to over-approximate the reachable set of this system after one time step, which corresponds to bounding the image of the error function.
Note that in this case, it is important that this final reachable set is computed as a single step, and not decomposed into a sequence of smaller intermediate steps whose iterative updates of the internal state would have no mathematical meaning for the static (stateless) error function.

As a consequence of the equalities and set inclusions in~\eqref{eq:error set 1}-\eqref{eq:error set 2} and the fact that the reachability methods to be used in the first two steps of Algorithm~\ref{alg:nODE} described above guarantee that the obtained sets are over-approximations of the output or reachable sets of interest, we have thus reached a solution to Problem~\ref{def:error bound}.
\begin{theorem}
\label{thm:error}
The set $\Omega_\varepsilon(\mathcal{X}_{in})$ obtained after applying this second step described above solves Problem~\ref{def:error bound}:
$$\mathcal{R}_\varepsilon(\mathcal{X}_{in})=\left\{\Phi(1, u)-(u+f(u))~|~u\in\mathcal{X}_{in}\right\}\subseteq\Omega_\varepsilon(\mathcal{X}_{in}).$$
\end{theorem}

Note that the error bound in Theorem~\ref{thm:error} is defined as a set in the state space of the neural ODE.
This differs from the approach in~\cite{sander2022residualneuralnetworksdiscretize}, where the error bound is defined as a positive scalar.

A second and more important difference with this work is the tightness of the obtained error bounds.
Indeed, if we adapt the results from~\cite{sander2022residualneuralnetworksdiscretize} to the context of our framework described in Section~\ref{section:preliminaries}, their error bound is expressed as: 
$$\varepsilon\leq \frac{e^L-1}{L}\left\|\frac{1}{2}f'(x)f(x)\right\|_\infty,~\forall x\in\mathcal{R}^{\text{tube}}_{\text{neural ODE}}(\mathcal{X}_{in}),$$
where $L$ is a Lipschitz constant of the neural ODE vector field.
The term $\left\|\frac{1}{2}f'(x)f(x)\right\|_\infty$ can be obtained by first over-approximating the error set by $\Omega_\varepsilon(\mathcal{X}_{in})$ in the same way we did, but the infinity norm forces to expand this set to make it symmetrical around $0$, and then keeping only the maximum value among its components (thus corresponding to a second expansion of this set into an hypercube whose width along all dimensions is the largest width of the previous set).
In addition, for any system with non-zero Lipschitz constant, the factor $\frac{e^L-1}{L}$ is always greater than $1$, which increases this error bound even more.

In summary, this scalar error bound is doubly more conservative than our proposed set-based error bound.
The comparison of both approaches is illustrated in the numerical example of Section~\ref{section:expirements}.



\subsection{Verification proxy}

To address Problem~\ref{def:safety verification}, we leverage the similar behavior between the neural ODE and ResNet models to verify safety properties on one model using the reachable set of the other, combined with the error bound from Theorem~\ref{thm:error}. Specifically, we want to verify whether the reachable output set of a model is contained in the safe set $\mathcal{X}_{s}$, i.e., $\mathcal{R}(\mathcal{X}_{in}) \subseteq \mathcal{X}_s$. 

We first focus on the case of Algorithm~\ref{alg:nODE} to verify the safety property on the neural ODE, based on the reachability analysis of the ResNet.
This first verification proxy relies on the set-based version of~\eqref{eq:nODE~ResNet} using the Minkowski sum:
\begin{equation}
    \mathcal{R}_{\text{neural ODE}}(\mathcal{X}_{in}) \subseteq \Omega_{\text{ResNet}}(\mathcal{X}_{in}) + \Omega_{\varepsilon}(\mathcal{X}_{in}),
    \label{eq:nODE~ResNet-sb}
\end{equation}
stating that the reachable output set of the neural ODE is contained in the output set over-approximation of the ResNet $\Omega_{\text{ResNet}}(\mathcal{X}_{in})$, expanded by the bounding set of the error $\Omega_{\varepsilon}(\mathcal{X}_{in})$ obtained after applying the first two lines of Algorithm~\ref{alg:nODE} as described in Section~\ref{subsect:bounding error}.

Therefore, this verification procedure is achieved as in Algorithm~\ref{alg:nODE}, by first using existing set-propagation or reachability analysis tools to compute an over-approximation $\textcolor{blue}{\Omega_{\text{ResNet}}(\mathcal{X}_{in})}$ of the ResNet output set (line 3).
Then in line 4, an over-approximation of the neural ODE output set can be deduced from~\eqref{eq:nODE~ResNet-sb} by taking the Minkowski sum of $\textcolor{blue}{\Omega_{\text{ResNet}}(\mathcal{X}_{in})}$ and our error bound $\textcolor{red}{\Omega_{\varepsilon}(\mathcal{X}_{in})}$.
If $\Omega_{\text{neural ODE}}(\mathcal{X}_{in})$ is contained in the safe set $\textcolor{OliveGreen}{\mathcal{X}_{s}}$, then the neural ODE satisfies the safety property, otherwise the result is inconclusive (line 5-9).


\begin{algorithm}[htb]
\caption{Safety Verification Framework for neural ODE based on ResNet}
\label{alg:nODE}
\textbf{Input}: a neural ODE, an input set $\mathcal{X}_{in}$ and a safe set $\mathcal{X}_s$.\\
\textbf{Output}: \textbf{Safe} or \textbf{Unknown}.
\begin{algorithmic}[1] %[1] enables line numbers
\STATE compute an over-approximation of the reachable tube of the neural ODE $\Omega^{\text{tube}}_{\text{neural ODE}}(\mathcal{X}_{in})$;
\STATE compute the over-approximation of the error set $\textcolor{red}{\Omega_{\varepsilon}(\mathcal{X}_{in})}$, $\forall x \in \Omega^{\text{tube}}_{\text{neural ODE}}(\mathcal{X}_{in})$;
\STATE compute the over-approximation of the ResNet output $\textcolor{blue}{\Omega_{\text{ResNet}}(\mathcal{X}_{in})}$;
\STATE deduce an over-approximation of the neural ODE output\\ $\Omega_{\text{neural ODE}}(\mathcal{X}_{in}) = \textcolor{red}{\Omega_{\text{ResNet}}(\mathcal{X}_{in})+\Omega_{\varepsilon}(\mathcal{X}_{in})}$;
\IF {$\Omega_{\text{neural ODE}}(\mathcal{X}_{in}) \subseteq \textcolor{OliveGreen}{\mathcal{X}_{s}}$}
\STATE return \textbf{Safe}
\ELSE
\STATE return \textbf{Unknown}
\ENDIF
\end{algorithmic}
\end{algorithm}

\bigskip
Reversing the roles, the case of verifying the ResNet based on the reachability analysis of the neural ODE is described in Algorithm~\ref{alg:ResNet}.
This case is very similar to the previous one, so we focus here on the main differences with Algorithm~\ref{alg:nODE}.
The first difference is that in~\eqref{eq:ResNet~nODE}, the term representing the approximation error between the models appears with a negative sign.
Therefore, when converting this equation into a set inclusion similarly to~\eqref{eq:nODE~ResNet-sb}, we need to be careful to add the negation of the error set (and not to do a set difference, which is not the correct set operation in our case).
We thus introduce the negative error set 
$$\Omega_{-\varepsilon}(\mathcal{X}_{in}) = \{-\varepsilon \mid \varepsilon \in \Omega_\varepsilon(\mathcal{X}_{in})\},$$
in order to convert~\eqref{eq:ResNet~nODE} into its set-based notation as follows:
\begin{equation}
    \mathcal{R}_{\text{ResNet}}(\mathcal{X}_{in}) \subseteq \Omega_{\text{neural ODE}}(\mathcal{X}_{in}) + \Omega_{-\varepsilon}(\mathcal{X}_{in}).
    \label{eq:ResNet~nODE-sb}
\end{equation}

The second difference is that in line 3 of Algorithm~\ref{alg:ResNet}, we compute an over-approximation of the reachable set of the neural ODE, using any classical tools for reachability analysis of continuous-time nonlinear systems, and add it to the negative error set to obtain an over-approximation of the ResNet output set.
This final set can then similarly be used to verify the satisfaction of the safety property on the ResNet model.


\begin{algorithm}[htb]
\caption{Safety Verification Framework for ResNet based on neural ODE}
\label{alg:ResNet}
\textbf{Input}: a ResNet, an input set $\mathcal{X}_{in}$ and a safe set $\mathcal{X}_s$.\\
\textbf{Output}: \textbf{Safe} or \textbf{Unknown}.
\begin{algorithmic}[1] %[1] enables line numbers
\STATE compute an over-approximation of the reachable tube of the neural ODE $\Omega^{\text{tube}}_{\text{neural ODE}}(\mathcal{X}_{in})$;
\STATE compute the over-approximation of the negative error set $\textcolor{red}{\Omega_{-\varepsilon}(\mathcal{X}_{in})}$, $\forall x \in \Omega^{\text{tube}}_{\text{neural ODE}}(\mathcal{X}_{in})$;
\STATE compute the over-approximation of the neural ODE output $\Omega_{\text{neural ODE}}(\mathcal{X}_{in})$;
\STATE deduce an over-approximation of the ResNet output\\ $\textcolor{blue}{\Omega_{\text{ResNet}}(\mathcal{X}_{in})} = \textcolor{red}{\Omega_{\text{neural ODE}}(\mathcal{X}_{in})+\Omega_{-\varepsilon}(\mathcal{X}_{in})}$;
\IF {$\textcolor{blue}{\Omega_{\text{ResNet}}(\mathcal{X}_{in})} \subseteq \textcolor{OliveGreen}{\mathcal{X}_{s}}$}
\STATE return \textbf{Safe}
\ELSE
\STATE return \textbf{Unknown}
\ENDIF
\end{algorithmic}
\end{algorithm}

\begin{theorem}[Soundness]
\label{sound}
For the case that either Algorithm~\ref{alg:nODE} or~\ref{alg:ResNet} returns \textbf{\em Safe},  the safety property in the sense of Problem~\ref{def:safety verification} holds true\em~\cite{liang2022safetyverificationneuralnetworks}.  
\end{theorem}

The soundness of the verification framework is guaranteed because both algorithms rely on over-approximations of the true reachable sets. Specifically,~\eqref{eq:nODE~ResNet-sb} ensures that $\mathcal{R}_{\text{neural ODE}}(\mathcal{X}_{in}) \subseteq \Omega_{\text{neural ODE}}(\mathcal{X}_{in})$, and~\eqref{eq:ResNet~nODE-sb} ensures $\mathcal{R}_{\text{ResNet}}(\mathcal{X}_{in}) \subseteq \Omega_{\text{ResNet}}(\mathcal{X}_{in})$. These inclusions hold due
to the conservative nature of the considered reachability analysis and error bound computations in Section~\ref{subsect:bounding error} (Theorem~\ref{thm:error}).