\section{Automated Transformations}
\subsection{Storm-IR}
We implement our transformations on an intermediate representation, called
Storm-IR, for probabilistic programs~\cite{dutta2019storm}.
Figure~\ref{fig:grammar} presents the syntax of Storm-IR. Storm-IR is an
imperative language with support for standard constructs like arithmetic
operations, loops and conditionals, and probabilistic constructs like sampling
from distributions (\emph{Dist}) and conditioning on data (\emph{observe}). The
Storm framework also provides translators from Storm-IR to other probabilistic
programming languages like Stan~\cite{carpenter2016stan} and
Pyro~\cite{bingham2018pyro} and vice-versa. Using Storm-IR allows our
transformations to be language-agnostic and also leverage a host of different
program analyses (e.g. dimensional, interval, and data-flow analysis) which help
us implement our transformations easily.

\label{sec:stormir}
\begin{table}[!ht]
\syntaxsize{}\centering
\begin{tabular}{rcl}
    $x$ & $\in$ & \textit{Vars} \\
    $c$ & $\in$ & $\textit{Consts} \cup \{-\infty, \infty\}$ \\
    $\textit{op}$ & $\in$ & $\{+, -, >, $...$  \}$  \\
    $\textit{Dist}$ & $\in$ & $\{$Normal, Uniform, ...$\}$ \\
  [10pt]
  Type &::=& \texttt{Int} $\mid$ \texttt{Float} \\      
  Decl &::=& $x$ \texttt{:} Type $\mid$ $x$ \texttt{:} [$c^+$]\\

    Expr &::=& $c$ $\mid$ $x$  $\mid$  \emph{Dist}.pdf(Expr) $\mid$\\
          & &     Expr \emph{op} Expr\\



    Stmt &::=& $x$ = Expr  \\
               &  & $\mid$ for $x$ $\in$ 1..N; \{
    Stmt$^{*}$ \}\\
    &  & $\mid$ \texttt{observe}(\emph{Dist}(Expr$^{*}$),  $x$)\\
    &  & $\mid$ \texttt{factor}(Expr)\\
    & & $\mid$ \texttt{if} (Expr) \texttt{then}
    Stmt$^{*}$ \texttt{else} Stmt$^{*}$ \\
    & & $\mid$ $x$ \texttt{:=} \emph{Dist}(Expr$^{*}$)\\
    & & $\mid$ Decl \\
    Program &::=&  Stmt$^{*}$   \\
\end{tabular}
\caption{Syntax of Intermediate Representation}
\label{fig:grammar}
\end{table}

\subsection{Automatically Transforming Programs}
\label{sec:autotrans}


We implement our transformations on an intermediate representation (IR) for
probabilistic programs: Storm-IR~\cite{dutta2019storm}. %
Storm-IR represents
standard and probabilistic language constructs like sampling from
distributions (Dist) and conditioning on data (factor) as a graph with program
elements as nodes, and control flow as edges (similar to standard compiler
CFG~\cite{allen1970control}). Since Storm-IR supports multiple languages (e.g.,
Stan, Pyro, Edward), it allows \NAME to be language-agnostic and
also provides a host of different program analyses (e.g., dimensional,
interval, and data-flow analysis) that help us implement transformations
easily and correctly.

\NAME first parses the original probabilistic program into abstract syntax tree
and converts to Storm-IR. On this IR, searching for the code pattern from
Table~1 amounts to searching for a subgraph that encodes
the pattern (e.g., statements corresponding to $\beta \sim
\pi_{\beta}(\alpha)$ and $y_{i=1}^{D}\sim F(\beta)$; which do not need be
adjacent), while remembering the concrete variable names (e.g., $\beta \mapsto$
\texttt{b}, $y \mapsto$ \texttt{data}) and distributions (e.g., $F \mapsto
\mathcal{N}(\texttt{b}, \texttt{s})$). After identifying the pattern, \NAME
checks for the transformation legality and uses the identified
distributions/variables to instantiate the transformation template and uses
Storm-IR API to update the program graph. \NAME allows users to
implement new transformations on Storm-IR, which is analogous to writing a
compiler transformation pass.
\NAME{} implementation allows applying transformations iteratively on the same program, however, 
we observed that the combined transformations do not provide additional robustness benefits,
while their inference quality suffers from the complicated model.

Here we present the code patterns of the original and transformed programs
for each transformation:

\mypara{Bayesian Data Reweighting}
\begin{figure}[b!]
\begin{minipage}{0.47\textwidth}
  \centering
  \setlength\doublerulesep{1pt}
  \scriptsize
  \begin{tabular}{ll}\hline\hline

  $x_1$ := \textit{DistExpr}$_1$   &  prior \\
      \texttt{...} \\
  
          \texttt{for }($i$\texttt{ = 1..D}) \\
      \quad\texttt{factor}(\textit{DistExpr}$_2$($x_1$)\texttt{.pdf}($y[i]$)) & conditioning\\
      \texttt{return} $x_1$            & posterior \\\hline\hline
  
      \hspace{10.5em} $\Downarrow$ \\\hline\hline


      $x_1$ := \textit{DistExpr}$_1$   &  prior (unchanged) \\
      \textbf{\texttt{var }$\boldsymbol{w}$\texttt{[D]}} & init. weights. \\
      \textbf{\texttt{for }($\boldsymbol{i}$\texttt{ = 1..D})} \\
      \quad $\boldsymbol{w[i]}$\textbf{ := \textit{$\textit{Beta}(\boldsymbol{\gamma,\eta})$}} & reweighting dist. \\
      \texttt{...} \\
          \texttt{for }($i$\texttt{ = 1..D}) \\
      \quad\texttt{factor}(\textit{DistExpr}$_2$($x_1$)\texttt{.pdf}($y[i]$) \texttt{*} \boldsymbol{$w[i]$})  & reweighted obs. \\
      \texttt{return} $x_1, \boldsymbol{w}$         & posteriors\\ \hline\hline
  \end{tabular}   
\caption{Reweighting Transformation Code Pattern}
\label{fig:drw-trans}
\end{minipage}
\end{figure}
Figure~\ref{fig:drw-trans} presents the code pattern demonstrating this
transformation. The transformation is applicable on any model with the
\texttt{factor} statement. During the transformation, the prior distributions
of the parameters ($x_1$) in the model remain unchanged. 
We introduce the new parameter $w$ (vector),
and multiply each $w[i]$ to the log-probability expression of $y[i]$ in factor.


\mypara{Localization}
Figure~\ref{fig:loc-transpatt} presents the code pattern for
this transformation. This transformation is applicable whenever
there is a \texttt{factor} statement in a for-loop.
First, we introduce the parameter $\eta$ (vector) as the localized 
for of $x_1$. Then we update the factor expression to relate each data point
$y[i]$ with an individual realization of the parameter $\eta[i]$.  We also
initialize the parameter with prior distributions.
  \begin{figure}[!h]
    \begin{minipage}{0.47\textwidth}
    \centering
    \setlength\doublerulesep{1pt}
      \scriptsize
      \begin{tabular}{ll}\hline\hline
      $x_1$ := \textit{DistExpr}$_1$   &  prior \\
          \texttt{...} \\
          \texttt{for }($i$\texttt{ = 1..D}) \\
          \quad \texttt{factor}(\textit{DistExpr}$_2$(...,$x_1$,...)\texttt{.pdf}($y[i]$)) & conditioning on \\
          \texttt{return} $x_1$            & obs. $y$ posterior \\\hline\hline
      
      \hspace{10.5em}  $\Downarrow$ \\\hline\hline
          $x_1$ := \textit{DistExpr}$_1$   &  prior (unchanged) \\
          \textbf{$\boldsymbol{s}$ := \textit{Unif}($0, 1$)}  & new hyper-prior\\
          \textbf{\texttt{var }$\boldsymbol{\eta}$\texttt{[D]}} &  localized params. \\
          \textbf{\texttt{for }($\boldsymbol{i}$\texttt{ = 1..D})} \\
          \quad \textbf{$\boldsymbol{\eta[i]}$ \texttt{:=} \textit{Normal}($\boldsymbol{x_1, s}$)}   &  new priors  \\
          \texttt{...} \\
          \texttt{for }(i\texttt{ = 1..D}) \\
          \quad \texttt{factor}(\textit{DistExpr}$_2$(...,$\boldsymbol{\eta[i]}$,...)\texttt{.pdf}($y[i]$))  & localized obs. \\
          \texttt{return} $x_1, \boldsymbol{\eta}$         & posteriors\\\hline\hline
      \end{tabular}   
    \caption{Localization Transformation Code Pattern}
    \label{fig:loc-transpatt}
    \end{minipage}
    \end{figure}

\mypara{Normal to Student-T} Figure~\ref{fig:n2t-patt} presents the code pattern
for this transformation. We change an old Normal distribution with a Student-T
distribution. This transformation is applicable for normal
distributions in \texttt{factor} statement.

\begin{figure}[!h]
  \begin{minipage}{0.47\textwidth}
  \centering
    \setlength\doublerulesep{1pt}
  \scriptsize
  \begin{tabular}{ll}\hline\hline
  $x_1$ := \textit{DistExpr}$_1$   &  prior mean\\
  $x_2$ := \textit{DistExpr}$_2$   &  prior std\\
  ... \\
    \texttt{for }($i$\texttt{ = 1..D}) \\
    \quad\texttt{factor}(\textit{Normal}($x_1,x_2$)\texttt{.pdf}($y[i]$)) & conditioning\\
      \texttt{return} $x_1$            & posterior \\\hline\hline
  
      \hspace{11.5em} $\Downarrow$ \\\hline\hline
  $x_1$ := \textit{DistExpr}$_1$   &  prior mean \\
  $x_2$ := \textit{DistExpr}$_2$   &  prior std \\
      \textbf{$\boldsymbol {\nu}$ := \textit{$\textit{Unif}(\texttt{...})$}} & new hyper-prior \\
  ... \\
      \texttt{for }(i\texttt{ = 1..D}) \\
      \quad\texttt{factor}(\textbf{\textit{StudentT}}($\boldsymbol{\nu}, x_1, x_2$)\texttt{.pdf}($y[i]$))  & Student-T dist. \\
      \texttt{return} $x_1, \boldsymbol {\nu}$         & posteriors\\ \hline\hline
  \end{tabular}
   
\caption{Normal/Student-T Transformation Code Pattern}
\label{fig:n2t-patt}
\end{minipage}
\end{figure}

\mypara{Reparameterization and Localization of the Scale Parameter} 
We present the code pattern for this transformation in 
Figure~\ref{fig:rep-patt}.
This
transformation is only applicable when there are normal distributions in the
\texttt{factor} statement. We introduce a new parameter $\tau$ (vector), where
$\tau$ follows a \emph{Gamma} distribution with a newly added hyper-parameter
$\nu$.
We update the factor expression by dividing the standard deviation $x_2$ by the
\mbox{inverse square-root of $\tau$.}


\mypara{Contaminated Group Mixture}
    We present the code pattern for this transformation in Figure~\ref{fig:mix-patt}. The
transformation is only applicable when the distribution in the \texttt{factor}
statement has the location and scale parameters. We introduce a new factor
statement that samples from a \emph{LogNormal} distribution for outliers. The
model is changed to either sample from the original distribution or the
outlier distribution, encoded as an if-then-else statement.




\begin{figure}[!h]
\begin{minipage}{0.47\textwidth}
\centering
    \setlength\doublerulesep{1pt}
  \scriptsize
  \begin{tabular}{ll}\hline\hline
    $x_1$ := \textit{DistExpr}$_1$   &  prior mean \\
    $x_2$ := \textit{DistExpr}$_2$   &  prior std \\
      \texttt{...} \\
    \texttt{for }($i$\texttt{ = 1..D}) \\
      \quad \texttt{factor}(\textit{Normal}($x_1, x_2$)\texttt{.pdf}($y[i]$)) & conditioning \\
      \texttt{return} $x_1, x_2$            & Gauss. posterior \\\hline\hline
  
      \hspace{13.5em} $\Downarrow$ \\\hline\hline
      $x_1$ := \textit{DistExpr}$_1$   &  prior mean \\
      $x_2$ := \textit{DistExpr}$_2$   &  prior std \\
      \textbf{$\boldsymbol{\nu}$ := \textit{DistExpr}$_3$}      & new hyper-prior \\
    \textbf{\texttt{for }($\boldsymbol{i}$\texttt{ = 1..D})} \\
      \quad \textbf{$\boldsymbol{\tau[i]}$ := \textit{Gamma}(${\boldsymbol\nu\texttt{/}{2}}, {\boldsymbol\nu\texttt{/}{2}}$)} & robustness factors \\ 
      \texttt{...} \\
    \texttt{for }($i$\texttt{ = 1..D}) \\
      \quad \texttt{factor}(\textit{Normal}($x_1, x_2\textbf{\texttt{/sqrt}}({\boldsymbol{\tau[i]}})$)\texttt{.pdf}($y[i]$)) & conditioning\\
      \texttt{return} $x_1, x_2$            &  Gauss. posterior \\\hline\hline

  \end{tabular}   
\caption{Reparameterization Transformation Code Pattern}
\label{fig:rep-patt}
\end{minipage}
\end{figure}

\begin{figure}[hb!]
\begin{minipage}{0.47\textwidth}
\centering
    \setlength\doublerulesep{1pt}
  \scriptsize
  \begin{tabular}{ll}\hline\hline
    $x_1$ := \textit{DistExpr}$_1$   &  prior mean \\
    $x_2$ := \textit{DistExpr}$_2$   &  prior std \\
      \texttt{...} \\
    \texttt{for }($i$\texttt{ = 1..D}) \\
    \qquad  \texttt{factor} ( \textit{Dist}($\mu, \sigma$, ...).pdf (\textit{Expr}${}_2$) ) & conditioning \\
      \texttt{return} $x_1, x_2$            & Gauss. posterior \\\hline\hline
  
      \hspace{13.5em} $\Downarrow$ \\\hline\hline
      $x_1$ := \textit{DistExpr}$_1$   &  prior mean  \\
      $x_2$ := \textit{DistExpr}$_2$   &  prior std \\
      \textbf{$\boldsymbol{\rho_\textit{out}}$ := \textit{Unif}(0, 0.5)} & outlier probability. \\
      \textbf{$\boldsymbol{\mu_\textit{out}}$ := \textit{DistExpr}$\boldsymbol{_3}$}   &  outlier mean \\
      \textbf{$\boldsymbol{s_\textit{out}}$ := \textit{DistExpr}$\boldsymbol{_4}$}   &  outlier variance \\
      \textbf{\textit{out} := \textit{LogNormal}($\boldsymbol{\mu_\textit{out}, s_\textit{out}}$)} & outlier probability. \\
      \texttt{...} \\
      \textbf{\texttt{if}(\textit{Bernolli}($\boldsymbol{1 - \rho_\textit{out}}$))}        &  mixture \\
    \quad\texttt{for }($i$\texttt{ = 1..D}) \\
      \quad\quad \texttt{factor}(\textit{Dist}($x_1, x_2$)\texttt{.pdf}($y[i]$)) & conditioning\\
      \textbf{\texttt{else}}        &   Gauss. mixture \\
    \quad\texttt{for }($i$\texttt{ = 1..D}) \\
      \quad\quad \texttt{factor}(\textit{Dist}($x_1, \textbf{\texttt{sqrt}}$($\textbf{\texttt{exp}}(\textbf{\textit{out}}$))\texttt{.pdf}($y[i]$)) & conditioning\\

      \texttt{return} $x_1, x_2$            &  Gauss. posterior \\\hline\hline

  \end{tabular}
   
\caption{Cont. Mixture  Transformation Code Pattern}
\label{fig:mix-patt}
\end{minipage}
\end{figure}









\section{Correctness of the Transformations}\label{sec:correct_app}

\input{mse_improve_name_table}

We formally state that the transformations we
    define in Section~\ref{sec:autotrans} have the semantic effects as
    proposed in the statistical literature (as summarized in
    Table~1). We leverage Stan's
    operational semantics \mbox{from~\cite{gorinova2019probabilistic}}.

     Given a program  $P$ in StormIR language, the StormIR
      translator will translate $P$ into a Stan program $S$ with equivalent semantics.
       There exists an one-to-one correspondence between StormIR expressions/statements and Stan
        expression/statements, by the definition of StormIR syntax (Appendix~A.1)
        and Stan syntax \cite{gorinova2019probabilistic}.
        For example, let $\Leftrightarrow$ denote the translation relation between a 
        StormIR expression/statement and a Stan expression/statement,
        then factor translation rule is:
        \[
            \frac{
                E_\textit{storm} \Leftrightarrow E'_\textit{stan}
            }
            {
                \textup{\texttt{factor}}(E_\textit{storm}) \Leftrightarrow \mathbf{target} = \mathbf{target} + \log(E'_\textit{stan})
            }.
        \]
      It states that StormIR's factor statement is translated to an assignment to a special variable \textbf{target} in Stan (it by convention contains unnormalized log-posterior), where the expression $E_\textit{storm}$ was recursively translated to $E'_\textit{stan}$. Rules for other statements are similar. 

            \begin{definition}
We denote as $P$  any StormIR program on which \NAME{} can apply a
transformation $T$ to get a transformed program $P_T$ according to the
Transformation Code Pattern shown in Section~\ref{sec:autotrans}.
      \end{definition}

      
      \begin{definition}
      We denote as  $p(\boldsymbol{\theta}|y)$ and $p_{T}(\boldsymbol{\theta}|y)$ the posteriors from the original and the transformed program using the transformation $T$ defined in Table~1 where  $\boldsymbol{\theta}$ represents all the parameters in the program and $y$ is the data.
      \end{definition}
      
    \begin{theorem}
      If the distribution of the program $P$ is equivalent (up to a
      unique normalizing constant) to $p(\boldsymbol{\theta}|y)$ then
      the distribution and $P_T$ is equivalent (up to a unique
      normalizing constant) to $p_{T}(\boldsymbol{\theta}|y)$.
    \end{theorem}


  We sketch the proof next. We first translate the programs $P$ and $P_T$
      to equivalent Stan programs $S$ and $S_T$, respectively, as discussed above.
      By Stan's operational
      semantics presented in~\citet{gorinova2019probabilistic}, we
      know that 
      there exists a unique end state $s$ for $S$ as
        $
            ((y, \boldsymbol{\theta}, \mathbf{target} \mapsto 0), S) \Downarrow s
        $
        where
        $
            s[\mathbf{target}] = \log p^* (\boldsymbol{\theta}|y).
        $ $p^* (\boldsymbol{\theta}|y)$ is the unnormalized posterior which uniquely defines
        the posterior as
        $p(\boldsymbol{\theta}|y) \propto p^* (\boldsymbol{\theta}|y)$.
        Similarly, $S_T$ results in the unique end state $s_T$ which has $s_T[\mathbf{target}] = \log p_T^* (\boldsymbol{\theta}|y)$, and $p_T(\boldsymbol{\theta}|y) \propto p^*_T (\boldsymbol{\theta}|y)$.
      Since $P$ and $S$ are equivalent, and 
      $P_T$ and $S_T$ are equivalent, we can next apply structural induction on the
      Stan statements that are defined in each rule from
      Figures~\ref{fig:drw-trans},~\ref{fig:loc-transpatt},~\ref{fig:n2t-patt},~\ref{fig:rep-patt},
      and~\ref{fig:mix-patt} to derive the posterior distributions of
      each original and transformed program, as $p^* (\boldsymbol{\theta}|y).$ and 
      $p_{T}^* (\boldsymbol{\theta}|y)$, respectively. 
      For each, we can
      immediately verify that there is an equivalence relation between $p^* (\boldsymbol{\theta}|y)$ and $p(\boldsymbol{\theta}|y)$ defined in Table~1, and between $p_T^* (\boldsymbol{\theta}|y)$ and $p_T(\boldsymbol{\theta}|y)$.
      










\section{Best MSE Improvement for Different Noise Models}
\label{sec:mse_imp}
Tables~\ref{tab:imp_table_advi},\ref{tab:imp_table_nuts} present the best MSE
improvements for ADVI and NUTS across different noise models and programs. The
cells with ``--'' mean that the noise model is not applicable to the data in the
program.


\section{Convergence scores at Noise Levels 2 and 6}
\label{sec:all_conv}

Tables~\ref{tab:rhatlevel2} and \ref{tab:rhatlevel6} present the convergence scores at noise levels 2 and 6. We observed a similar overall trend in convergence scores across different noise levels.

\begin{table}[!ht]
  \caption{(Geometric-)Mean of Rhat at Noise Level 2}%
  \label{tab:rhatlevel2}
\centering
  \scriptsize
  \setlength{\tabcolsep}{4pt}
 \begin{tabular}{l|rr|rr|rr}
  \toprule
    \textbf{Transformations} & \multicolumn{2}{c|}{\textbf{Outliers}} & \multicolumn{2}{c|}{\textbf{Hidden Group}} & \multicolumn{2}{c}{\textbf{Skewed Data}} \\ \midrule
  & ADVI & NUTS & ADVI & NUTS & ADVI & NUTS\\
\midrule
Original & 1.75 & 1.05 & 1.16 & 1.00& 2.43 & 1.08\\ 
Reweighting & 1.33 & 1.11 & 1.19 & 1.01& 1.40 & 1.03\\ 
Localized-Loc & 3.40 & 1.38 & 2.18 & 1.13& 4.15 & 1.21\\ 
Localized-Scale & 4.24 & 1.43 & 1.85 & 1.03& 4.47 & 1.05\\ 
Reparam-Local & 2.02 & 1.25 & 1.25 & 1.02& 2.36 & 1.15\\ 
StudentT & 1.66 & 1.41 & 1.22 & 1.00& 1.72 & 1.34\\ 
Cont. Group Mixture & 7.17 & -- & 8.77 & --& 8.43 & --\\ 
\bottomrule
\end{tabular}
\end{table}



\begin{table}[!ht]
  \caption{(Geometric-)Mean of Rhat at Noise Level 6}%
  \label{tab:rhatlevel6}
\centering
  \scriptsize
  \setlength{\tabcolsep}{4pt}
 \begin{tabular}{l|rr|rr|rr}
  \toprule
    \textbf{Transformations} & \multicolumn{2}{c|}{\textbf{Outliers}} & \multicolumn{2}{c|}{\textbf{Hidden Group}} & \multicolumn{2}{c}{\textbf{Skewed Data}} \\ \midrule
  & ADVI & NUTS & ADVI & NUTS & ADVI & NUTS\\
\midrule
Original & 1.79 & 1.46 & 1.32 & 1.00& 1.65 & 1.04\\ 
Reweighting & 1.34 & 1.19 & 1.17 & 1.00& 1.30 & 1.01\\ 
Localized-Loc & 3.86 & 1.34 & 2.83 & 1.16& 3.77 & 1.25\\ 
Localized-Scale & 3.04 & 1.38 & 2.05 & 1.04& 3.75 & 1.10\\ 
Reparam-Local & 2.01 & 1.34 & 1.32 & 1.00& 2.35 & 1.18\\ 
StudentT & 1.56 & 1.26 & 1.19 & 1.01& 1.97 & 1.37\\ 
Cont. Group Mixture & 8.99 & -- & 8.48 & --& 7.86 & --\\ 
\bottomrule
\end{tabular}
\end{table}


\section{Other Diagnostics for NUTS at Noise Level 10}
Here we present two other diagnostics for NUTS, the effective sample size (ESS)
and the trajectory divergence.

Table~\ref{tab:esslevel10} presents the ESS
at noise level 10 for every 4x1000 samples after warmup, 
under a timeout of 8 minutes for each chain. A small ESS also indicates the lack of convergence.

\newcolumntype{H}{>{\setbox0=\hbox\bgroup}c<{\egroup}@{}}
\begin{table}[!ht]
  \caption{(Geometric-)Mean of ESS at Noise Level 10}
  \label{tab:esslevel10}
\centering
  \scriptsize
  \setlength{\tabcolsep}{4pt}
 \begin{tabular}{l|Hr|Hr|Hr}
  \toprule
    \textbf{Transformations} & \multicolumn{2}{c|}{\textbf{Outliers}} & \multicolumn{2}{c|}{\textbf{Hidden Group}} & \multicolumn{2}{c}{\textbf{Skewed Data}} \\ \midrule
Original & - & 2482.04 & - & 2526.45& - & 2144.90\\ 
Reweighting & - & 2422.91 & - & 2289.96& - & 2397.99\\ 
Localized-Loc & - & 876.01 & - & 1067.06& - & 1179.52\\ 
Localized-Scale & - & 1114.61 & - & 1467.49& - & 842.73\\ 
Reparam-Local & - & 1707.49 & - & 2296.89& - & 2029.92\\ 
StudentT & - & 1338.56 & - & 2673.49& -  & 1786.98\\ 
Cont. Group Mixture & - & - & - & -& - & -\\ 
\bottomrule
\end{tabular}
\end{table}


For NUTS, the geometric means of the \emph{trajectory divergence} over all the applicable models
for each transformation at each noise level is smaller than 0.01, with 90\% of the models have 
trajectory divergence being 0. Such a small trajectory divergence
portion does not indicate any issue of concerns.




