\section{Discussion}
\vspace{-.05in}
Since $\textit{MSE}$ supports a wide variety of model classes and
  is easily automated, we only report the evaluation result based on
  $\textit{MSE}$ in this work.  However, our methodology may not tell
  the entire story about the robustness for all models: (1) it does
  not take into account the uncertainty in predictions; and (2) it
  does not use a held-out \mbox{test set to compute the MSE.}

Therefore, after the posterior predictive checks in ASTRA show how well the robustness transformations perform, we suggest \NAME{} users to further evaluate the subset of well-performing transformations (semi-automatically) for specific model classes using specialized methods (e.g., 
 predictive on test data, cross-validation, sensitivity analysis, etc.) \cite{gelman2013bayesian}. These methods can play a complementary role to \NAME{} automated analyses.

For example, one may further study 
the model performance
on future observations using the following procedure:
if \NAME{} shows a poor fit (high $\textit{MSE}$)
using its noise models, 
then it indicates that prediction of future data is 
also likely highly inaccurate. 
If \NAME{} shows a good fit (low $\textit{MSE}$), it means 
one could also apply other analyses 
to improve user confidence in the model.
In particular, for regression models, one could
conduct a cross-validation by splitting
 the existing data into train/validation/test.
For time-series models, one can manually split
the data and apply the leave-future out (LFO) cross-validation~\cite{burkner2020approximate}.


 We anticipate our work and further automation of applying model
 robustness transformations and testing for model robustness can lead
 to future works on 
 1) general techniques for improving PP robustness, 
 2) libraries of techniques applicable for specific, but broad, 
 classes of probabilistic models. 
 In addition, we believe that symbolic techniques for 
 robustness analysis and inference (e.g.~\cite{psense,huang2021aqua}) 
 can further help improve the 
 reliability of the implementations of robust probabilistic programs.
 

\section{Related Work}
\looseness=-2 \mypara{Robust Probabilistic Modeling} We evaluated various \robts
previously proposed in
literature~\citep{Wang:2017,wang2018general,berger1994overview}. These works did
only evaluate small number of programs which makes the generality of their
methods unclear. \citet{Wang:2017} evaluated on six models, and they compared to
\citep{wang2018general} on a single model. \citet{wang2018general} evaluated
their method on four models. Also, these works did not the report run time of
the robust models. In addition to these approaches,
\citet{futami2017variational} proposed a robust version of KL divergence to make
variational inference robust of outliers. Their approach focuses on making the
inference more robust instead of the model itself. \citet{gbohounme2017self}
proposed special measures to improve the robustness of logistic regression
models.





\looseness=-2
\mypara{Robustness of Neural Networks} Despite their tremendous success in
various domains, neural networks are known to be vulnerable to adversarial
examples. Researchers have proposed ways to both design attacks for testing the
robustness of neural networks~\cite{carlini2017towards,gopinath2017deepsafe} and
defending against adversarial
observations~\cite{gu2014towards,papernot2016distillation,shaham2018understanding}.
In this work, we consider multiple attack (or noise) models used \mbox{previously for
probabilistic models.} The source of our noise may not necessarily be 
adversarial but may stem from practical sources such as erroneous measurements, 
data corruptions, or random failures.


