\newpage
\onecolumn
\appendix

\section{Implementation Details}
For reproducibility, we will release the full code upon acceptance. Nevertheless, we give the detailed implementation of Algorithm \ref{alg:RS_DS} in PyTorch \cite{pytorch_neurips} below. 

\lstinputlisting[language=Python]{NeurIPS21/code.py}
For comparisons against \cite{cohen2019certified}, we followed their official code in \url{https://github.com/locuslab/smoothing}. We also followed the common practice in using their provided code for certifying all models in all of our experiments. For comparisons against \textit{SmoothAdv} \citep{salman2019provably}, we also followed their official implementation in \url{https://github.com/Hadisalman/smoothing-adversarial} and similarly for \textit{MACER} \url{https://github.com/RuntianZ/macer}.

% \newpage
\section{Data Dependent Greedy Search Over $\sigma$}
We observe that the optimization problem \eqref{eq:our_objective_v2} that we solve for every input $x$ is one dimensional in $\sigma_x^*$. In this section, we show that heuristic grid search procedures are far inferior to solving Equation \eqref{eq:our_objective_v2} with our solver in Algorithm \ref{alg:RS_DS}. In particular, we show that under the same sample complexity as our approach for data dependent certification a trivial heuristic grid search as a baseline does not work. We conduct experiments where we only certify with data dependent smoothing a pre-trained model $\text{SmoothAdv}$ with training $\sigma \in \{0.12,0.25.0.50\}$ on CIFAR10. We examine a single model $\text{SmoothAdv-DS}$ which is certified with $K=100$ iterations and with $n=1$ to approximate the expectation in Algorithm \ref{alg:RS_DS}. Observe that since $n=1$, and including the forward and backward passes computation, our data dependent certification of $\text{SmoothAdv-DS}$ has a total of $200$, since $K=100$, evaluations for every given $x$ before performing the certification with the optimized $\sigma_x^*$. To that end, we compare against a crude grid search baseline over $\hat{\sigma}_x^*$, and for a fair comparison, with a total of $200$ evaluations. We restrict the grid search to $\hat{\sigma}_x^* \in [0,1]$ with a resolution of $\nicefrac{n}{200}$ so that the total number of evaluations is always exactly 200 similar to our $\text{SmoothAdv-DS}$. That is to say, the grid heuristic search solves the following problem:

\begin{equation}
\begin{aligned}
\label{eq:grid_search_obj}
    \hat{\sigma}_x^* = \argmax_{\sigma_i \in \{0,\frac{n}{200},\frac{2n}{200}, \dots, 1-\frac{n}{200}, 1\}} \frac{\sigma_i}{2} \left(\Phi^{-1}\left(
    \frac{1}{n} \sum_{i=1}^n \hat{f}_\theta^{c_A}(x+\sigma_i \hat{\epsilon})]\right) - \Phi^{-1}\left(\max_{c \neq c_A} \frac{1}{n} \sum_{i=1}^n \hat{f}_\theta^c(x+\sigma_i\hat{\epsilon})]\right)\right).
\end{aligned} 
\end{equation}

We also explore with the number of samples to $n \in \{1,2,4,10\}$ for the grid search pipeline. Note that this trades-off the accuracy of the expectation approximation to the resolution of the solution $\hat{\sigma}_x^*$.

We summarize our results in Figure \ref{fig:appendix_greedy}. Note that in the first three figures, we report certified accuracies for when the model is certified with the same $\sigma = \{0,12,0.25,0.50\}$ used in training without data dependent smoothing, i.e. fixed $\sigma$ for all inputs. We refer to these plots as $\text{SmoothAdv-0.12}$, $\text{SmoothAdv-0.25}$ and $\text{SmoothAdv-0.50}$. In addition, we refer to the data dependent baseline grid search heuristic as $\text{SmoothAdv-GDS-n-1}$, $\text{SmoothAdv-GDS-n-2}$, $\text{SmoothAdv-GDS-n-4}$, and $\text{SmoothAdv-GDS-n-10}$ where $n$ refers to the number of samples approximating the expectation in Equation \eqref{eq:grid_search_obj}. We report the envelops in the last figure.

At first we observe that the larger $n$ used to approximate the expectation, the better the overall certification accuracy. This is regardless of the $\sigma$ used to train the model. However, the performance is still far inferior to the baseline that is data \textit{independent} which is inferior to our approach. This is also evident from the envelope last figure. This indicates that while data dependent smoothing is essential towards improving performance, a careful optimization is necessary for it to work. We reiterate here that both the grid search heuristic and our approach use the same number of evaluations, \ie 200, when certifying the model; however, our approach reported in Figure \ref{fig:SmoothAdv} are far more superior.


\input{NeurIPS21/figs/appendix_greedy_figures.tex}




\section{Memory-Based Certification for Data Dependent Classifiers}


% \begin{algorithm}[t]
% \SetAlgoLined
% \KwInput{input point $x_{N+1}$, certified region $\mathcal{R}_{N+1}$, prediction $\mathcal{C}_{N+1}$, and memory $\mathcal{M}$}
% \KwResult{Prediction for $x_{N+1}$ and certified region at $x_{N+1}$ that does not intersect with any certified region in $\mathcal{M}$.}
% % $\mathcal{S}_x \leftarrow \mathcal{S}_x^{g}$\;
% \For{$(x_i, \mathcal{C}_i, \mathcal{R}_i) \in \mathcal{M}$}{
%     \uIf{$x_{N+1} \in \mathcal{R}_i$}{
%         \uIf{$\mathcal{C}_{N+1} = \mathcal{C}_i$}{
%             add $(x_{N+1}, \mathcal{C}_{N+1}, \mathcal{R}_{N+1})$ to $\mathcal{M}$\;
%             \Return $\mathcal{C}_{N+1}$,  $\mathcal{R}_{N+1}$\;
%         } \Else {
%         $\tilde{\mathcal{R}}_{N+1}$ = \texttt{LargestCertInSubset}($x_i$, $\mathcal{R}_i$, $x_{N+1}$, $\mathcal{R}_{N+1}$)\;
%         add $(x_{N+1}, \mathcal{C}_{i}, \tilde{\mathcal{R}}_{N+1})$ to $\mathcal{M}$\; 
%             \Return $\mathcal{C}_{i}$,  $\tilde{\mathcal{R}}_{N+1}$\;
%         }
%     }
%     \uIf{\texttt{Intersect}($\mathcal{R}_{N+1}, \mathcal{R}_i$)}{
%     $\mathcal{R}'_{N+1}$ = \texttt{LargestCertOutSubset}($x_i$, $\mathcal{R}_i$, $x_{N+1}$, $\mathcal{R}_{N+1}$)\;
%         %compute large non-intersecting with $\mathcal{S}_i$ region $\mathcal{R}_{N+1}$\;
%         $\mathcal{R}_{N+1} \leftarrow \mathcal{R}'_{N+1}$\;
%     }
% }
% add $(x_{N+1}, \mathcal{C}_{N+1}, \mathcal{R}_{N+1)}$ to $\mathcal{M}$\;
% \Return $\mathcal{C}_{N+1}$,  $\mathcal{R}_{N+1}$\;
% \caption{Memory-Based Certification}
% \label{alg:practical_certification}
% \end{algorithm}


\begin{algorithm}[H]
\SetAlgoLined
\KwInput{input point $x_{N+1}$, certified region $\mathcal{R}_{N+1}$, prediction $\mathcal{C}_{N+1}$, and memory $\mathcal{M}$}
\KwResult{Prediction for $x_{N+1}$ and certified region at $x_{N+1}$ that does not intersect with any certified region in $\mathcal{M}$.}
% $\mathcal{S}_x \leftarrow \mathcal{S}_x^{g}$\;
\For{$(x_i, \mathcal{C}_i, \mathcal{R}_i) \in \mathcal{M}$}{
    \uIf{$\mathcal{C}_{N+1} \neq \mathcal{C}_i$}{
        \uIf{$x_{N+1} \in \mathcal{R}_i$}{ $\tilde{\mathcal{R}}_{N+1}$ = \texttt{LargestInSubset}($\mathcal{R}_i$, $\mathcal{R}_{N+1}$), \\
        $\mathcal{R}_{N+1} \leftarrow \tilde{\mathcal{R}}_{N+1}$\;
        $\mathcal{C}_{N+1} \leftarrow \mathcal{C}_i$
        }        \uElseIf{\texttt{Intersect}($\mathcal{R}_{N+1}, \mathcal{R}_i$)}{
        $\mathcal{R}'_{N+1}$ = \texttt{LargestOutSubset}($\mathcal{R}_i$, $\mathcal{R}_{N+1}$)\;
            %compute large non-intersecting with $\mathcal{S}_i$ region $\mathcal{R}_{N+1}$\;
            $\mathcal{R}_{N+1} \leftarrow \mathcal{R}'_{N+1}$\;
        }
    }
}
add $(x_{N+1}, \mathcal{C}_{N+1}, \mathcal{R}_{N+1)}$ to $\mathcal{M}$\;
\Return $\mathcal{C}_{N+1}$,  $\mathcal{R}_{N+1}$\;
\caption{Memory-Based Certification}
\label{alg:practical_certification}
\end{algorithm}

Let $\mathcal{M} = \{(x_i, \mathcal{C}_i,\mathcal{R}_i)\}_{i=1}^N$ be set of the triplets: the input $x_i$, the prediction of $x_i$ denoted by $\mathcal{C}_i$ and the certification region at $x_i$ denoted as $\mathcal{R}_i$ which is characterized by the certification radius $R_i$ and the center $x_i$. Moreover, we assume that $\mathcal{R}_{i} \cap \mathcal{R}_j = \emptyset, \forall i\neq j, \mathcal{C}_i \neq \mathcal{C}_j$. That is to say, none of the certification regions of the inputs stored in the memory $\mathcal{M}$ intersect for inputs with different predictions. This is the key property for a sound certification procedure. Otherwise, if such a property does not hold, then this implies that the data dependent classifier produces different predictions within the same certified region. In what follows, and to circumvent this nuisance in the data dependent classifier $g_\theta$, we rely on updating the memory while enforcing this property to hold. In particular, we certify the data dependent classifier using the classical Monte Carlo approach by \cite{cohen2019certified} while guaranteeing that the certified regions does not intersect with the certification region of any previously predicted inputs. 


In what follows, we present Algorithm \ref{alg:practical_certification} that enforces the non-intersection property of certified regions $\mathcal{M}$. Let $\mathcal{R}_{N+1}$ be the certified region at $x_{N+1}$ of $g_\theta$. %\textbf{(i)} If $x_{N+1} \in \mathcal{R}_i$ and $\argmax_c g^c_\theta(x_{N+1}) = \mathcal{C}_i $, then we add the triplet $(x_i,\mathcal{C}_i,\mathcal{R}_{N+1})$ to $\mathcal{M}$ (first example in Figure \ref{fig:memory-based-algorithm}).
\textbf{(i)} If $x_{N+1} \in \mathcal{R}_i$ and $\argmax_c g^c_\theta(x_{N+1}) \neq \mathcal{C}_i$, we find the largest $\tilde{\mathcal{R}}_{N+1}$ such that the following two properties hold  $\tilde{\mathcal{R}}_{N+1} \subset \mathcal{R}_{N+1}$ and $\tilde{\mathcal{R}}_{N+1} \subset \mathcal{R}_{i}$. For when the certified regions $\mathcal{R}_i$ are simple $\ell_2$-balls, finding the largest $\tilde{\mathcal{R}}_{N+1}$ satisfying previous two properties is straightforward. We denote this with  the function \texttt{LargestInSubset} %We then add $(x_{N+1}, \mathcal{C}_{N+1}, \tilde{\mathcal{R}}_{N+1})$ to $\mathcal{M}$
(second example in Figure \ref{fig:memory-based-algorithm}). We then update $\mathcal{R}_{N+1}$ with the refined $\tilde{\mathcal{R}}_{N+1}$ and change $\mathcal{C}_{N+1}$ to $\mathcal{C}_i$. \textbf{(ii)} Otherwise, if $\mathcal{R}_{N+1} \cap \mathcal{R}_i \neq \emptyset$ where $x_{N+1} \notin \mathcal{R}_i$ and $\argmax_c g_\theta^c(x_{N+1}) \neq \mathcal{C}_i$, %we add the triplet $(x_{N+1}, \mathcal{C}_{N+1}, \mathcal{R}'_{N+1})$
we find $\mathcal{R}'_{N+1}$ such that $\mathcal{R}'_{N+1} \subseteq \mathcal{R}_{N+1}$ is the largest subset of $\mathcal{R}_{N+1}$ non-intersecting with $\mathcal{R}_i$. We denote this function \texttt{LargestOutSubset} (third example in Figure \ref{fig:memory-based-algorithm}). We then update $\mathcal{R}_{N+1}$ with the refined $\mathcal{R}'_{N+1}$. Moreover, computing \texttt{LargestOutSubset} for when $\mathcal{R}_i$ are $\ell_2$-balls is straightforward. At last, note that \texttt{Intersect} is a function that returns whether two $\ell_2$-balls intersect. At last, we then add $(x_{N+1}, \mathcal{C}_{N+1}, \mathcal{R}_{N+1})$ to memory. We provide below a pytorch implementation of the memory-based certification of the pseudo-algorithm \ref{alg:practical_certification}.





% \begin{wrapfigure}{r}{0.60\textwidth}\vspace{-0.7cm}
% \begin{minipage}{0.6\textwidth}
% \begin{algorithm}[H]
%   \DontPrintSemicolon
%   \SetKwFunction{FMain}{PostCertify}
%   \SetKwProg{Fn}{Function}{:}{}
%   \Fn{\FMain{\texttt{ImgDict, RadDict, PreDict, NewImg, NewRad, NewPre}}}{
%   \texttt{diff = torch.norm(NewImg - ImgDict)} \\
%   \texttt{overlaps = diff < RadDict + NewRad} \\
%   \texttt{Prob = PreDict[overlaps] != NewPre} \\
%   \If{\texttt{Prob.any()}}{\texttt{Adjust(NewRad, NewPre)} }
% %   \texttt{Prob = PreDict[overlaps] != NewPre}
% %   \If{\texttt{Prob.any()}}{\texttt{Adjust(NewRad, NewPre)}}
% %   }
%  \texttt{ImgDict $\leftarrow$ NewImg} \\
%  \texttt{RadDict $\leftarrow$ NewRad} \\
%  \texttt{PreDict $\leftarrow$ NewPre}
%  }
%   \caption{Post Certification} \label{alg:post_certification}
% \end{algorithm}
% \end{minipage}
% \end{wrapfigure}


% Here we need to elaborate more about the memory based algorithm we used for correcting the data-dependent certification. We might want to mention the infinite memory requirement (it is currently mentioned under limitations in the appendix). We want also to mention and elaborate on all cases of overlap as well.
% Here we need to elaborate more about the memory based algorithm we used for correcting the data-dependent certification. We might want to mention the infinite memory requirement (it is currently mentioned under limitations in the appendix). We want also to mention and elaborate on all cases of overlap as well.

While the memory-based certification is essential for a sound certification, empirically on CIFAR10 and ImageNet, we never found in any of the experiments a case where two inputs predicted differently suffer from intersecting certified regions. That is to say, the certified regions in the memory for every input is the certified regions granted by the Monte Carlo certificates of \cite{cohen2019certified} for the data dependent classifier. We hypothesize that this is due to the following reasons: \textbf{(i)} Image datasets have very high dimensionality, resulting in samples very far apart, compared with the certified radius that randomized smoothing could provide. Thus, it is very unlikely to find two samples that have intersecting certified regions. \textbf{(ii)} Even if the rare case where two image inputs are close to one another that their certified regions intersect, we found that the data dependent classifier $g_\theta$ predicts these inputs similarly (the left example of Figure \ref{fig:memory-based-algorithm}). This is since the data dependent classifier is trained to output smooth prediction, \ie prediction changes are small for small input changes, resulting in a shared prediction. \textbf{(iii)} To maintain reasonable test accuracy on clean samples, the values of $\sigma$, and correspondingly optimized $\sigma_x^*$ used in smoothing are moderately low $(\sigma_x^* \leq 1.0)$. This results in limited smaller certified regions, $\ell_2$ balls of radii $\approx 4\sigma_x^*$ which is much smaller than the distance between inputs in higher dimensional data (\eg ImageNet). 

\textcolor{black}{It is worthwhile mentioning that while the memory-based certificate could work, in principle, independently without being combined with the data dependent smooth classifier under any arbitrary choice of a certification radius for every input; this results in a sub-optimal certification. This is since this may result in one of the following situations: \textbf{(i)} Assigning large radii for every input will yield a classifier that is very robust, but inaccurate. This is since several new points to be certified later will more likely fall in the certification region requiring either changing their prediction (inaccurate predictions) or reducing their certification radius. Therefore, measuring the certified accuracy for such a classifier will be very poor since it counts for both accuracy and robustness. \textbf{(ii)} Assigning, on the other hand, small certified radii for every input will result in a highly accurate classifier but very low robustness. Hence, this also results in a very small certified accuracy at large radii. Therefore, we combine the memory based certificate with our data-dependent smooth classifier that has a better robustness/accuracy tradeoff.  
}
% \textcolor{black}{It is worthwhile to mention that while this approach is mandatory for maintaining a valid a sound certification as discussed earlier, we found that in general certified radius, we found that in practice it does not adjust the radius for any sample in neither CIFAR10 nor ImageNet in all of our experiments. We hypothesize that this is due to the following reasons: \textbf{(i)} Image datasets have very high dimensionality compared with the certified radius that randomized smoothing could provide. Thus, it is very unlikely to find two instances that have overlapping certified radii. \textbf{(ii)} For the case of overlapping instances, we found that the classifier predicts these instances with the same label (the right part of Figure \ref{fig:memory-based-algorithm}). This is since the base classifier is trained to output smooth prediction, relatively close samples to each other will share the predicted label. \textbf{(iii)} To maintain reasonable test accuracy on clean samples, the values of $\sigma$ used in smoothing are moderately low $(\sigma \leq 1.0)$. This results in limited certified radius $(R < 4\sigma)$ which is much smaller than the distance between instances in very high dimensions (\eg ImageNet). We refer interested readers to the appendix for more elaboration with a detailed implementation of our procedure. }


For completeness, we provide the full implementation of the memory-based algorithm using PyTorch.


\lstinputlisting[language=Python]{ICLR22/post_certificate.py}


% \begin{figure}
%     \centering
%     \includegraphics{ICLR22/new_figures/post_certificate.png}
%     \caption{Effect of running post certificate algorithm on overlapping certified regions.}
%     \label{fig:post_certificate_figure}
% \end{figure}

\section{Additional Visualizations}

Here, we show similar results to the one in Figure \ref{fig:qualitative}. Similar to the earlier observations, while model parameters are fixed, optimal smoothing parameters vary per sample. 

% \newpage


\input{NeurIPS21/figs/appendix_visualization_cohen}
\input{NeurIPS21/figs/appendix_visualization_smoothadv}

\clearpage


% \bibi{add in the formal counter example we have written in the rebuttal of ancer?}
% \bibi{we also mention that we want to add a figure?}


% \textcolor{black}{\textbf{Runtime.} We measure the certification runtime on an NVIDIA Quadro RTX-6000 GPU. To certify a CIFAR10 test sample with ResNet18, it takes 1.6 and 1.7 seconds for a fixed $\sigma$ and data dependent smoothing $\sigma_x^*$ (with $K=900$), respectively. Moreover, it takes 109.5 and 135.8 seconds to certify an ImageNet test sample with ResNet50 (with $K=400$). Note that the runtime overhead added by using Algorithm \ref{alg:RS_DS} is negligible compared to the certification performance gains.}



% \section{Data-Dependent Smoothing for $\ell_1$ Certificates.}

% \textcolor{black}{While indeed we focused both our methodology and experiments on $\ell_2$ certificates, our methodology is extendable to any other $\ell_p$ certificate. For that regard and as per reviewer's request, we conducted experiments on $\ell_1$ certification. We leveraged the results of \cite{yang2020randomized} that derived the tightest $\ell_1$ certificate using randomized smoothing with uniform distribution $\mathcal U[-\lambda, \lambda]^d$. The certified radius in that case has the form $\mathcal R_1 = \lambda(p_A - p_B)$. We replace our objective in Equation \eqref{eq:our_objective_v2} with:
% \[
% \lambda_x^* = \text{arg}\max_\lambda \lambda \left(\mathbb E_{\epsilon\sim \mathcal{U}[-\lambda, \lambda]^d}(f_\theta^{c_A}(x+\epsilon)) - \max_{c \neq c_A} \mathbb E_{\epsilon\sim \mathcal{U}[-\lambda, \lambda]^d}(f_\theta^{c}(x+\epsilon))  \right).
% \]
% We solved our objective in an identical fashion to our Algorithm 1 with the same hyperparameters for $\lambda \in \{0.25, 0.5, 1.0 \}$ in certification on both CIFAR10 and ImageNet. Further, we combine our data-dependent smooth classifier with the memory based algorithm proposed in Section 3.5. It is worthwhile mentioning that similar to the $\ell_2$ case, the memory based algorithm did not find any overlap between the certified regions of any pair of instances.
% We report the results in Figure \ref{fig:yang} and Table \ref{tab:l1}. We observe that, similar to our extensive experiments on the $\ell_2$ certificate, our proposed memory-enhanced data-dependent smoothing yields consistent improvement in the $\ell_1$ certified accuracy. We report an improvement  of 7\% and 3\% over the state of the art certified accuracy at $\ell_1$ radius of 0.5 on CIFAR10 and ImageNet, respectively. At last, we note similar improvement to the $\ell_1$ ACR as reported in Table \ref{tab:l1}.
% }



% \begin{figure*}[t]
%      \centering
%      \begin{subfigure}[b]{0.32\textwidth}
%          \centering
%          \includegraphics[width=\textwidth]{ICLR22/new_figures/l1_cifar10-0.25.png}
%      \end{subfigure}
%     %  \hfill
%      \begin{subfigure}[b]{0.32\textwidth}
%          \centering
%          \includegraphics[width=\textwidth]{ICLR22/new_figures/l1_cifar10-0.5.png}
%      \end{subfigure}
%     %  \hfill
%      \begin{subfigure}[b]{0.32\textwidth}
%          \centering
%          \includegraphics[width=\textwidth]{ICLR22/new_figures/l1_cifar10-1.0.png}
%      \end{subfigure}
%     % \hfill
%     \begin{subfigure}[b]{0.32\textwidth}
%          \centering
%          \includegraphics[width=\textwidth]{ICLR22/new_figures/l1_imagenet-0.25.png}
%      \end{subfigure}
%     %  \hfill
%      \begin{subfigure}[b]{0.32\textwidth}
%          \centering
%          \includegraphics[width=\textwidth]{ICLR22/new_figures/l1_imagenet-0.5.png}
%      \end{subfigure}
%     %  \hfill
%      \begin{subfigure}[b]{0.32\textwidth}
%          \centering
%          \includegraphics[width=\textwidth]{ICLR22/new_figures/l1_imagenet-1.0.png}
%      \end{subfigure}
%         \caption{
%         \textcolor{black}{ 
%         \textbf{$\ell_1$ Certified accuracy comparison against $\text{Yang}$ per radius per $\sigma$.} We compare $\text{Yang}$ against
%         % our data dependent certification
%         $\text{Yang-DS}$.
%         % and when data dependency is incorporated in both training and certification
%         % for several $\sigma$. 
%       We show CIFAR10 and ImageNet results in first and second rows, respectively. Similar to the earlier experiments on $\ell_2$ certificate, deploying data-dependent smoothing with the memory enhanced classifier yields significant improvement for the $\ell_1$ certified accuracy in all considered scenarios. }}
%         % , where the last column is the envelope.}
%         \label{fig:yang}
% \end{figure*}

% \input{CVPR21/tables/l1}





\section{Where are the good $\sigma^*_x$?}

\begin{figure}
    \centering
    \includegraphics[width=0.5\textwidth]{UAI22/rebuttal_figures/cohen_comparison_0.25.pdf}
    \caption{\textbf{Where are the good $\sigma_x^*$?} We plot a histogram of the $\sigma_x^*$ in orange highlighting $\sigma_x^*$ at which the certified radius is improved in green.}
    \label{fig:final-experiment}
\end{figure}


% \textbf{Regarding the minor issues.} We thank the reviewer for this insightful experiments. As suggested, 
At last, one natural question that arises is that which $\sigma_x^*$ is yielding better certified radii? 
To that regard, we conduct the following experiment for Cohen baseline at $\sigma=0.25$.
% we conducted this experiment for Cohen baseline at $\sigma=0.25$. 
% W
We plot the histogram of the obtained $\sigma_x^*$ for CIFAR10 in orange. We also plot a histogram of the $\sigma_x^*$ at which the certified radius is improved in green. We report the results in Figure~\ref{fig:final-experiment}.
We found that the certified robustness improvements happen at the full spectrum of $\sigma_x^*$ showing the efficacy of our proposed data-dependent smoothing.

\section{Runtime}
\textcolor{black}{We measure the certification runtime on an NVIDIA Quadro RTX-6000 GPU for our proposed data dependent smoothed classifier (time includes Algorithm \ref{alg:RS_DS} in addition to the memory based certification) compared to the certification of a fixed $\sigma$ classifier. Certifying one CIFAR10 test input with ResNet18 takes 1.6 and an average of 1.8 seconds for a fixed $\sigma$ classifier and for the data dependent classifier ($K = 900$), respectively. Certifying an ImageNet test input on ResNet50 takes 109.5 and an average of 136 seconds for a fixed $\sigma$ classifier and our data dependent classifier ($K=400$), respectively. The runtime overhead added by
using Algorithm \ref{alg:RS_DS} and memory based certification is negligible compared to the gains in certified accuracy.}

\section{Limitations, Broader Impact and Compute Powers Used.}

\paragraph{Limitations.} Similar to any certification framework, the main limitation of this kind of work is its running time to compute the certified radius. The proposed memory-based certification is at the cost of both memory and computational complexity. While the memory cost is of order $\mathcal{O}(N)$ the computational complexity is more involved. Let $p$ be the probability that a new point $x_{N+1}$ be in one of the certification regions $\mathcal{R}_{i}$, $n$ is the complexity of computing a certification region at $x_{N+1}$, \ie $\mathcal{R}_{N+1}$, using the classical Monte Carlo Algorithms of \cite{cohen2019certified}, then the expected computational complexity of prediction or certification will be $\mathcal{O}(N p+(1-p)(2 N+n))$. The factor $2N+n$ is due to performing $N$ comparisons to check that $x_{N+1}$ is not in any $\mathcal{R}_{i}$, then computing $\mathcal{R}_{N+1}$ of complexity $n$, and at last a complexity of $N$ for computing $\mathcal{R}_{N+1}^{\prime}$. Informally, for larger $N$ and in small dimensional input, $p \approx 1$ leading to a complexity of order $N$. When $N$ is smaller compared to the input dimension, we have $p \approx 0$ with an expected complexity of order $2 N+n$.
\textcolor{black}{Moreover, the memory-based data dependent smooth classifier is order dependent. That is, the certified accuracy depends on the order at which the data at test time is presented. However, this is not the case if there is no overlap between the certified regions of differently predicted inputs (middle and right scenarios of Figure 2 do not occur). We found that is the case in all of our experiments making our memory-enhanced data dependent smooth classifier order invariant. We elaborated on why we believe that is the case in Appendix C.}
We plan in future extension to delve into more practical solution to this problem.
% However, our proposed framework, as dictated in the Runtime paragraph of section 4.4, infers a negligible additional cost to the overall pipeline. \textcolor{black}{Moreover, the memory-based algorithm we proposed for correcting the certification requires, in principle, an infinite storage.
We postpone the design for more efficient algorithms that validate the soundness of data dependent certification to future work.

\paragraph{Broader Impact.} While the performance of Deep Neural networks is dominating over several fields, the existence of adversarial examples hinders their deployment in lots of applications. This raise the attention to build networks that are not only accurate, but also robust to such perturbations. This work takes a step towards a remedy for this nuisance by improving the certified robustness of deep neural networks. 



\paragraph{Compute Powers.} In our experiment on CIFAR10, we used either NVIDIA Quadro RTX-600 GPU or NVIDIA 1080TI GPU. For ImageNet experiments, we used NVIDIA-V100 GPU. Note that one GPU was enough to run any of our experiments.


\section{Detailed ablations}

\subsection{Cohen vs Cohen-DS vs Cohen-DS$^2$}
In this section, we detail the certified accuracy per radius for all trained models per $\sigma$ for Cohen and per $\sigma$ and number of iterations $K$ for \cite{cohen2019certified}, Cohen-DS and Cohen-DS$^2$ in Algorithm \ref{alg:RS_DS} on both CIFAR10 and ImageNet.



% Cohen on CIFAR10
\begin{table*}[ht]
\small
\centering
\caption{\textbf{Certified accuracy per radius on CIFAR10.} We compare $\text{Cohen}$ against $\text{Cohen-DS}$ under varying $\sigma$ and number of iterations $K$ in Algorithm \ref{alg:RS_DS}.}
\centering
\begin{tabular}{c|cc| ccccccccccc }
\toprule
        & \multicolumn{2}{c|}{\textcolor{black}{$\ell_2^r$ (CIFAR10)}}  & \multirow{1}{*}{0.0} & \multirow{1}{*}{0.25} & \multirow{1}{*}{0.50} & \multirow{1}{*}{0.75} & \multirow{1}{*}{1.00} & \multirow{1}{*}{1.25} & \multirow{1}{*}{1.50} & \multirow{1}{*}{1.75} & \multirow{1}{*}{2.00}& \multirow{1}{*}{2.25} & \multirow{1}{*}{2.50}\\
     \midrule
      \parbox[t]{2mm}{\multirow{3}{*}{\rotatebox[origin=c]{90}{Cohen}}}  
    & \multicolumn{2}{c|}{$\sigma$ = 0.12} & 79.89& 56.26& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
    & \multicolumn{2}{c|}{$\sigma$ = 0.25} & 74.45& 58.34& 40.13& 22.85& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
    & \multicolumn{2}{c|}{$\sigma$ = 0.50} & 63.72& 52.15& 40.13& 29.17& 20.18& 13.08& 7.33& 3.33& 0.0& 0.0& 0.0\\
\midrule
 \parbox[t]{2mm}{\multirow{27}{*}{\rotatebox[origin=c]{90}{Cohen-DS}}}  & $\sigma = $0.12 & K=100 & 77.19& 61.27& 20.8& 5.47& 1.23& 0.02& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=200 & 74.98& 60.67& 19.75& 4.05& 0.94& 0.22& 0.01& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=300 & 73.56& 60.08& 19.63& 4.37& 1.12& 0.36& 0.08& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=400 & 72.11& 59.38& 19.58& 4.27& 1.39& 0.58& 0.13& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=500 & 70.78& 58.77& 19.5& 4.77& 1.51& 0.68& 0.14& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=600 & 70.17& 58.46& 19.51& 4.75& 1.63& 0.85& 0.18& 0.03& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=700 & 69.83& 58.25& 19.91& 5.05& 1.83& 0.88& 0.21& 0.04& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=800 & 69.25& 57.97& 19.75& 5.04& 1.99& 0.95& 0.17& 0.03& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=900 & 68.27& 57.51& 19.91& 5.07& 1.94& 0.93& 0.21& 0.04& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=100 & 73.17& 64.54& 47.48& 22.58& 6.53& 1.82& 0.47& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=200 & 71.62& 64.2& 47.3& 21.66& 5.45& 1.15& 0.34& 0.13& 0.06& 0.01& 0.0\\
 & $\sigma = $0.25 & K=300 & 70.23& 63.91& 47.44& 21.75& 5.38& 1.37& 0.43& 0.21& 0.13& 0.04& 0.02\\
 & $\sigma = $0.25 & K=400 & 69.41& 63.42& 47.43& 22.23& 5.83& 1.38& 0.49& 0.26& 0.1& 0.04& 0.02\\
 & $\sigma = $0.25 & K=500 & 68.88& 63.53& 47.56& 22.19& 5.8& 1.54& 0.53& 0.26& 0.1& 0.06& 0.03\\
 & $\sigma = $0.25 & K=600 & 68.09& 63.21& 47.78& 22.05& 6.16& 1.59& 0.51& 0.25& 0.12& 0.07& 0.02\\
 & $\sigma = $0.25 & K=700 & 67.57& 63.02& 47.6& 22.25& 5.97& 1.63& 0.57& 0.29& 0.11& 0.04& 0.02\\
 & $\sigma = $0.25 & K=800 & 67.36& 62.93& 47.64& 22.04& 6.29& 1.62& 0.6& 0.27& 0.11& 0.04& 0.02\\
 & $\sigma = $0.25 & K=900 & 67.22& 62.93& 47.45& 22.55& 6.19& 1.62& 0.54& 0.26& 0.11& 0.05& 0.03\\
 & $\sigma = $0.50 & K=100 & 63.18& 55.88& 47.07& 37.2& 26.56& 16.43& 8.0& 3.21& 1.23& 0.55& 0.19\\
 & $\sigma = $0.50 & K=200 & 61.26& 55.08& 47.25& 37.86& 27.25& 16.49& 7.49& 2.56& 1.07& 0.53& 0.23\\
 & $\sigma = $0.50 & K=300 & 59.52& 54.25& 47.35& 38.28& 27.29& 16.23& 7.16& 2.39& 0.96& 0.48& 0.24\\
 & $\sigma = $0.50 & K=400 & 58.29& 53.67& 47.19& 38.05& 27.45& 16.39& 7.41& 2.44& 0.93& 0.45& 0.24\\
 & $\sigma = $0.50 & K=500 & 57.46& 53.53& 47.38& 38.28& 27.47& 16.45& 7.38& 2.38& 0.87& 0.48& 0.24\\
 & $\sigma = $0.50 & K=600 & 56.68& 53.11& 47.04& 38.21& 27.47& 16.34& 7.21& 2.37& 1.03& 0.55& 0.32\\
 & $\sigma = $0.50 & K=700 & 55.83& 52.37& 46.88& 38.12& 27.43& 16.37& 7.21& 2.3& 1.01& 0.57& 0.37\\
 & $\sigma = $0.50 & K=800 & 55.26& 52.11& 46.8& 38.3& 27.26& 16.19& 7.18& 2.45& 1.04& 0.62& 0.4\\
 & $\sigma = $0.50 & K=900 & 54.83& 51.83& 46.62& 38.15& 27.55& 16.5& 7.37& 2.52& 1.21& 0.69& 0.5\\
 \bottomrule
\end{tabular}\vspace{-10pt}
\end{table*}
\begin{table*}[ht]
\small
\centering
\caption{\textbf{Certified accuracy per radius on CIFAR10.} We report $\text{Cohen-DS}^2$ under varying $\sigma$ and number of iterations $K$ in Algorithm \ref{alg:RS_DS}.}
\centering
\begin{tabular}{c|cc| ccccccccccc }
\toprule 
      & \multicolumn{2}{c|}{\textcolor{black}{$\ell_2^r$ (CIFAR10)}}  & \multirow{1}{*}{0.0} & \multirow{1}{*}{0.25} & \multirow{1}{*}{0.50} & \multirow{1}{*}{0.75} & \multirow{1}{*}{1.00} & \multirow{1}{*}{1.25} & \multirow{1}{*}{1.50} & \multirow{1}{*}{1.75} & \multirow{1}{*}{2.00}& \multirow{1}{*}{2.25} & \multirow{1}{*}{2.50}\\
\midrule
 \parbox[t]{2mm}{\multirow{60}{*}{\rotatebox[origin=c]{90}{Cohen-DS$^2$}}}  & $\sigma = $0.12 & K=100 & 79.8& 60.56& 26.87& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
%  & $\sigma = $0.12 & K=200 & 79.83& 62.05& 28.13& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
%  & $\sigma = $0.12 & K=300 & 79.74& 62.81& 26.96& 0.03& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
%  & $\sigma = $0.12 & K=400 & 79.56& 63.07& 25.47& 6.66& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
%  & $\sigma = $0.12 & K=500 & 79.4& 63.24& 24.23& 7.74& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
%  & $\sigma = $0.12 & K=600 & 79.14& 63.23& 23.58& 7.5& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
%  & $\sigma = $0.12 & K=700 & 78.95& 63.34& 22.96& 7.12& 0.86& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
%  & $\sigma = $0.12 & K=800 & 78.77& 63.34& 22.57& 6.48& 1.26& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
%  & $\sigma = $0.12 & K=900 & 79.06& 64.6& 22.69& 6.61& 1.68& 0.01& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=100 & 79.8& 60.56& 26.87& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=200 & 79.83& 62.05& 28.13& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=300 & 79.74& 62.81& 26.96& 0.03& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=400 & 79.56& 63.07& 25.47& 6.66& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=500 & 79.4& 63.24& 24.23& 7.74& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=600 & 79.14& 63.23& 23.58& 7.5& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=700 & 78.95& 63.34& 22.96& 7.12& 0.86& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=800 & 78.77& 63.34& 22.57& 6.48& 1.26& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=900 & 79.06& 64.6& 22.69& 6.61& 1.68& 0.01& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=1000 & 79.02& 64.54& 22.27& 6.27& 1.69& 0.02& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=1100 & 78.81& 64.41& 21.9& 5.89& 1.58& 0.22& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=1200 & 78.7& 64.37& 21.88& 5.45& 1.42& 0.27& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=1300 & 78.53& 64.39& 21.67& 5.15& 1.31& 0.26& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=1400 & 78.39& 64.46& 21.55& 4.96& 1.13& 0.29& 0.01& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=1500 & 78.31& 64.41& 21.56& 4.73& 1.05& 0.3& 0.03& 0.0& 0.0& 0.0& 0.0\\
%  & $\sigma = $0.25 & K=100 & 74.99& 61.47& 43.92& 24.54& 7.93& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
%  & $\sigma = $0.25 & K=200 & 75.13& 63.21& 45.94& 25.75& 9.35& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
%  & $\sigma = $0.25 & K=300 & 75.0& 64.03& 46.96& 25.96& 10.05& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
%  & $\sigma = $0.25 & K=400 & 75.04& 64.58& 47.59& 25.63& 9.93& 1.92& 0.0& 0.0& 0.0& 0.0& 0.0\\
%  & $\sigma = $0.25 & K=500 & 74.79& 64.9& 47.85& 25.41& 9.42& 2.6& 0.0& 0.0& 0.0& 0.0& 0.0\\
%  & $\sigma = $0.25 & K=600 & 74.7& 65.15& 48.38& 25.05& 8.88& 2.69& 0.0& 0.0& 0.0& 0.0& 0.0\\
%  & $\sigma = $0.25 & K=700 & 74.51& 65.35& 48.47& 24.71& 8.34& 2.62& 0.01& 0.0& 0.0& 0.0& 0.0\\
%  & $\sigma = $0.25 & K=800 & 74.46& 65.42& 48.5& 24.72& 7.98& 2.43& 0.52& 0.0& 0.0& 0.0& 0.0\\
%  & $\sigma = $0.25 & K=900 & 74.58& 66.42& 50.23& 25.57& 8.25& 2.83& 0.74& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=100 & 74.99& 61.47& 43.92& 24.54& 7.93& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=200 & 75.13& 63.21& 45.94& 25.75& 9.35& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=300 & 75.0& 64.03& 46.96& 25.96& 10.05& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=400 & 75.04& 64.58& 47.59& 25.63& 9.93& 1.92& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=500 & 74.79& 64.9& 47.85& 25.41& 9.42& 2.6& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=600 & 74.7& 65.15& 48.38& 25.05& 8.88& 2.69& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=700 & 74.51& 65.35& 48.47& 24.71& 8.34& 2.62& 0.01& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=800 & 74.46& 65.42& 48.5& 24.72& 7.98& 2.43& 0.52& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=900 & 74.58& 66.42& 50.23& 25.57& 8.25& 2.83& 0.74& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=1000 & 74.39& 66.47& 50.17& 25.41& 7.9& 2.63& 0.75& 0.01& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=1100 & 74.2& 66.42& 50.31& 25.13& 7.65& 2.41& 0.72& 0.14& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=1200 & 74.12& 66.37& 50.37& 24.92& 7.36& 2.31& 0.65& 0.18& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=1300 & 73.98& 66.41& 50.38& 24.75& 7.14& 2.25& 0.58& 0.19& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=1400 & 73.81& 66.39& 50.41& 24.79& 6.85& 2.2& 0.51& 0.15& 0.03& 0.0& 0.0\\
 & $\sigma = $0.25 & K=1500 & 73.67& 66.33& 50.31& 24.63& 6.68& 2.03& 0.54& 0.19& 0.03& 0.0& 0.0\\
%  & $\sigma = $0.50 & K=100 & 63.92& 53.49& 42.6& 31.83& 22.15& 14.12& 7.48& 3.51& 0.0& 0.0& 0.0\\
%  & $\sigma = $0.50 & K=200 & 64.14& 54.36& 44.3& 33.52& 23.79& 15.14& 7.93& 3.6& 1.09& 0.0& 0.0\\
%  & $\sigma = $0.50 & K=300 & 64.21& 54.95& 45.32& 35.04& 24.71& 15.81& 8.16& 3.65& 1.36& 0.0& 0.0\\
%  & $\sigma = $0.50 & K=400 & 64.22& 55.56& 45.92& 35.86& 25.45& 16.28& 8.35& 3.79& 1.41& 0.0& 0.0\\
%  & $\sigma = $0.50 & K=500 & 64.14& 55.84& 46.35& 36.29& 25.88& 16.56& 8.39& 3.69& 1.42& 0.3& 0.0\\
%  & $\sigma = $0.50 & K=600 & 64.14& 56.07& 46.7& 36.71& 26.18& 16.76& 8.48& 3.61& 1.41& 0.45& 0.0\\
%  & $\sigma = $0.50 & K=700 & 64.04& 56.2& 46.94& 37.09& 26.54& 16.76& 8.4& 3.63& 1.37& 0.45& 0.0\\
%  & $\sigma = $0.50 & K=800 & 63.93& 56.32& 47.23& 37.33& 26.69& 16.91& 8.35& 3.46& 1.28& 0.5& 0.09\\
%  & $\sigma = $0.50 & K=900 & 64.26& 57.26& 48.27& 38.85& 28.41& 17.97& 8.82& 3.66& 1.37& 0.58& 0.14\\
 & $\sigma = $0.50 & K=100 & 63.92& 53.49& 42.6& 31.83& 22.15& 14.12& 7.48& 3.51& 0.0& 0.0& 0.0\\
 & $\sigma = $0.50 & K=200 & 64.14& 54.36& 44.3& 33.52& 23.79& 15.14& 7.93& 3.6& 1.09& 0.0& 0.0\\
 & $\sigma = $0.50 & K=300 & 64.21& 54.95& 45.32& 35.04& 24.71& 15.81& 8.16& 3.65& 1.36& 0.0& 0.0\\
 & $\sigma = $0.50 & K=400 & 64.22& 55.56& 45.92& 35.86& 25.45& 16.28& 8.35& 3.79& 1.41& 0.0& 0.0\\
 & $\sigma = $0.50 & K=500 & 64.14& 55.84& 46.35& 36.29& 25.88& 16.56& 8.39& 3.69& 1.42& 0.3& 0.0\\
 & $\sigma = $0.50 & K=600 & 64.14& 56.07& 46.7& 36.71& 26.18& 16.76& 8.48& 3.61& 1.41& 0.45& 0.0\\
 & $\sigma = $0.50 & K=700 & 64.04& 56.2& 46.94& 37.09& 26.54& 16.76& 8.4& 3.63& 1.37& 0.45& 0.0\\
 & $\sigma = $0.50 & K=800 & 63.93& 56.32& 47.23& 37.33& 26.69& 16.91& 8.35& 3.46& 1.28& 0.5& 0.09\\
 & $\sigma = $0.50 & K=900 & 64.26& 57.26& 48.27& 38.85& 28.41& 17.97& 8.82& 3.66& 1.37& 0.58& 0.14\\
 & $\sigma = $0.50 & K=1000 & 64.06& 57.26& 48.41& 38.96& 28.49& 18.1& 8.65& 3.64& 1.33& 0.6& 0.21\\
 & $\sigma = $0.50 & K=1100 & 63.72& 57.21& 48.47& 39.0& 28.69& 18.26& 8.57& 3.55& 1.36& 0.59& 0.2\\
 & $\sigma = $0.50 & K=1200 & 63.56& 57.15& 48.67& 38.96& 28.81& 18.18& 8.53& 3.52& 1.32& 0.58& 0.23\\
 & $\sigma = $0.50 & K=1300 & 63.29& 57.01& 48.81& 39.07& 28.98& 18.31& 8.44& 3.44& 1.3& 0.6& 0.21\\
 & $\sigma = $0.50 & K=1400 & 63.09& 56.9& 48.88& 39.11& 29.07& 18.22& 8.62& 3.3& 1.34& 0.56& 0.22\\
 & $\sigma = $0.50 & K=1500 & 62.94& 56.87& 48.9& 39.21& 29.04& 18.1& 8.55& 3.27& 1.28& 0.53& 0.23\\
 \bottomrule
\end{tabular}\vspace{-10pt}
\end{table*}
% Cohen on Imagenet
\begin{table*}[ht]
\small
\centering
\caption{\textbf{Certified accuracy per radius on ImageNet.} We compare $\text{Cohen}$ against $\text{Cohen-DS}$ and $\text{Cohen-DS}^2$ under varying $\sigma$ and number of iterations $K$ in Algorithm \ref{alg:RS_DS}.}
\centering
\begin{tabular}{c|cc| ccccccccccc }
\toprule
        & \multicolumn{2}{c|}{\textcolor{black}{$\ell_2^r$ (ImageNet)}} & \multirow{1}{*}{0.0} & \multirow{1}{*}{0.25} & \multirow{1}{*}{0.50}& \multirow{1}{*}{0.75}  & \multirow{1}{*}{1.00} & \multirow{1}{*}{1.50} & \multirow{1}{*}{2.0} & \multirow{1}{*}{2.5} & \multirow{1}{*}{3.0}& \multirow{1}{*}{3.50}& \multirow{1}{*}{4.0} \\
     \midrule
      \parbox[t]{2mm}{\multirow{3}{*}{\rotatebox[origin=c]{90}{Cohen}}}
        & \multicolumn{2}{c|}{$\sigma$ = 0.25} & 66.6& 58.2& 49.0& 38.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
        & \multicolumn{2}{c|}{$\sigma$ = 0.50} & 57.2& 51.4& 45.8& 42.4& 37.4& 27.8& 0.0& 0.0& 0.0& 0.0& 0.0\\
        & \multicolumn{2}{c|}{$\sigma$ = 1.0} & 43.6& 40.6& 37.8& 35.4& 32.6& 25.8& 19.4& 14.4& 12.0& 8.6& 0.0\\
\midrule
 \parbox[t]{2mm}{\multirow{12}{*}{\rotatebox[origin=c]{90}{Cohen-DS}}}   & $\sigma = $0.25 & K=100 & 67.8& 61.0& 53.6& 42.8& 18.8& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=200 & 67.0& 61.4& 53.6& 43.0& 18.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=300 & 66.8& 61.2& 53.4& 42.2& 18.6& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=400 & 66.2& 61.4& 53.2& 42.2& 18.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.50 & K=100 & 58.4& 54.0& 48.2& 45.2& 40.6& 30.4& 1.8& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.50 & K=200 & 58.0& 53.4& 48.2& 45.2& 41.4& 29.8& 9.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.50 & K=300 & 58.0& 54.0& 48.8& 45.4& 41.4& 30.2& 9.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.50 & K=400 & 57.8& 53.8& 48.8& 45.6& 42.0& 30.4& 8.2& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $1.0 & K=100 & 45.0& 42.6& 40.4& 39.0& 36.4& 29.6& 22.4& 17.8& 13.8& 10.0& 0.2\\
 & $\sigma = $1.0 & K=200 & 45.2& 43.0& 41.8& 39.4& 36.8& 29.6& 23.0& 18.6& 14.2& 10.2& 0.6\\
 & $\sigma = $1.0 & K=300 & 45.0& 43.4& 41.2& 39.6& 37.2& 30.0& 23.4& 18.8& 14.4& 9.4& 2.0\\
 & $\sigma = $1.0 & K=400 & 44.8& 43.2& 41.4& 39.6& 37.2& 30.4& 23.2& 18.8& 14.6& 9.8& 1.8\\
 \midrule
 \parbox[t]{2mm}{\multirow{12}{*}{\rotatebox[origin=c]{90}{Cohen-DS$^2$}}}    & $\sigma = $0.25 & K=100 & 67.2& 64.2& 58.4& 45.4& 17.8& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=200 & 66.8& 64.2& 58.2& 45.6& 18.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=300 & 66.6& 64.2& 58.0& 45.2& 18.4& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=400 & 67.4& 64.2& 58.2& 45.0& 18.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.50 & K=100 & 58.0& 55.2& 51.6& 46.2& 41.2& 30.2& 2.2& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.50 & K=200 & 57.6& 55.2& 51.8& 47.0& 41.8& 30.4& 8.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.50 & K=300 & 57.6& 55.0& 51.8& 46.8& 41.8& 30.4& 8.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.50 & K=400 & 57.4& 55.4& 51.6& 47.4& 41.8& 30.6& 8.2& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $1.0 & K=100 & 46.4& 44.6& 41.4& 38.6& 37.2& 31.4& 24.8& 20.6& 16.6& 11.0& 0.4\\
 & $\sigma = $1.0 & K=200 & 46.6& 44.4& 42.0& 39.2& 37.6& 31.2& 25.0& 20.8& 17.0& 10.8& 0.4\\
 & $\sigma = $1.0 & K=300 & 46.0& 44.6& 41.8& 39.2& 37.4& 31.4& 24.6& 20.8& 17.2& 11.0& 1.8\\
 & $\sigma = $1.0 & K=400 & 46.8& 45.0& 42.6& 39.4& 37.6& 31.8& 24.8& 21.2& 16.8& 11.0& 2.0\\
 \bottomrule
\end{tabular}\vspace{-10pt}
\end{table*}



% \newpage

\subsection{SmoothAdv vs SmoothAdv-DS vs SmoothAdv-DS$^2$}
In a similar spirit to the previous section, we report the certified accuracy for the SmoothAdv variants, namely, SmoothAdv \cite{salman2019provably}, SmoothAdv-DS and SmoothAdv-DS$^2$ on CIFAR10 and ImageNet.


% SmoothAdv on Cifar10
\begin{table*}[ht]
\footnotesize
\centering
\caption{\textbf{Certified accuracy per radius on CIFAR10.} We compare $\text{SmoothAdv}$ against $\text{SmoothAdv-DS}$ and $\text{SmoothAdv-DS}^2$ under varying $\sigma$ and number of iterations $K$ in Algorithm \ref{alg:RS_DS}.}
\centering
\begin{tabular}{c|cc| ccccccccccc }
\toprule
        &  \multicolumn{2}{c|}{\textcolor{black}{$\ell_2^r$ (CIFAR10)}}  & \multirow{1}{*}{0.0} & \multirow{1}{*}{0.25} & \multirow{1}{*}{0.50} & \multirow{1}{*}{0.75} & \multirow{1}{*}{1.00} & \multirow{1}{*}{1.25} & \multirow{1}{*}{1.50} & \multirow{1}{*}{1.75} & \multirow{1}{*}{2.00}& \multirow{1}{*}{2.25} & \multirow{1}{*}{2.50}\\
     \midrule
      \parbox[t]{2mm}{\multirow{3}{*}{ \rotatebox[origin=c]{90}{SmoothAdv}}}  
    & \multicolumn{2}{c|}{$\sigma$ = 0.12} & 75.97& 62.44& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
    & \multicolumn{2}{c|}{$\sigma$ = 0.25} & 70.82& 59.55& 46.71& 33.66& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
    & \multicolumn{2}{c|}{$\sigma$ = 0.50} & 60.96& 52.6& 43.5& 34.62& 26.53& 19.49& 12.9& 7.47& 0.0& 0.0& 0.0\\
\midrule
 \parbox[t]{2mm}{\multirow{27}{*}{\rotatebox[origin=c]{90}{SmoothAdv-DS}}}   & $\sigma = $0.12 & K=100 & 75.74& 63.58& 40.88& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=200 & 75.7& 64.39& 45.05& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=300 & 75.69& 64.97& 46.13& 0.55& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=400 & 75.73& 65.43& 46.39& 22.49& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=500 & 75.74& 65.75& 46.57& 25.06& 0.03& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=600 & 75.72& 66.04& 46.64& 25.16& 0.24& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=700 & 75.66& 66.23& 46.74& 24.64& 7.26& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=800 & 75.65& 66.3& 46.61& 23.97& 11.54& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=900 & 75.64& 66.44& 46.43& 23.48& 11.75& 0.03& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=100 & 71.34& 60.81& 48.38& 35.14& 17.76& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=200 & 71.32& 61.38& 49.44& 36.24& 20.71& 0.01& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=300 & 71.3& 62.01& 50.16& 36.9& 21.69& 0.28& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=400 & 71.32& 62.45& 50.76& 37.24& 22.33& 8.37& 0.01& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=500 & 71.23& 62.82& 51.27& 37.46& 22.42& 10.67& 0.01& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=600 & 71.26& 63.02& 51.66& 37.66& 22.04& 11.06& 0.06& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=700 & 71.12& 63.26& 51.72& 37.61& 21.82& 11.0& 0.52& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=800 & 71.07& 63.4& 51.94& 37.5& 21.43& 10.6& 4.26& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=900 & 71.04& 63.54& 52.1& 37.38& 21.18& 10.1& 4.63& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.50 & K=100 & 61.16& 53.06& 44.28& 35.42& 27.29& 20.18& 13.6& 7.82& 0.02& 0.0& 0.0\\
 & $\sigma = $0.50 & K=200 & 61.22& 53.44& 44.96& 36.11& 28.0& 20.81& 13.98& 8.25& 2.85& 0.0& 0.0\\
 & $\sigma = $0.50 & K=300 & 61.24& 53.74& 45.39& 36.81& 28.72& 21.21& 14.23& 8.29& 3.29& 0.0& 0.0\\
 & $\sigma = $0.50 & K=400 & 61.22& 53.95& 45.65& 37.29& 29.22& 21.58& 14.53& 8.42& 3.78& 0.05& 0.0\\
 & $\sigma = $0.50 & K=500 & 61.21& 54.15& 46.03& 37.8& 29.54& 21.72& 14.73& 8.43& 3.99& 1.02& 0.0\\
 & $\sigma = $0.50 & K=600 & 61.2& 54.3& 46.42& 38.11& 29.83& 21.94& 14.95& 8.42& 4.07& 1.57& 0.0\\
 & $\sigma = $0.50 & K=700 & 61.23& 54.47& 46.58& 38.39& 30.24& 22.04& 14.95& 8.5& 4.12& 1.77& 0.03\\
 & $\sigma = $0.50 & K=800 & 61.19& 54.58& 46.73& 38.65& 30.39& 22.15& 14.86& 8.49& 4.09& 1.83& 0.43\\
 & $\sigma = $0.50 & K=900 & 61.25& 54.65& 46.88& 38.82& 30.6& 22.19& 14.89& 8.49& 4.17& 1.84& 0.6\\
 \midrule
 \parbox[t]{2mm}{\multirow{27}{*}{\rotatebox[origin=c]{90}{SmoothAdv-DS$^2$}}} 
  & $\sigma = $0.12 & K=100 & 76.04& 63.62& 41.88& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=200 & 76.03& 64.54& 46.4& 0.01& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=300 & 76.0& 65.36& 47.36& 0.82& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=400 & 75.99& 65.85& 47.98& 23.18& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=500 & 76.11& 66.15& 48.16& 26.09& 0.07& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=600 & 76.14& 66.39& 48.13& 26.08& 0.47& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=700 & 76.15& 66.52& 48.1& 25.73& 7.99& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=800 & 76.15& 66.69& 47.84& 25.16& 11.89& 0.02& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=900 & 76.05& 66.77& 47.9& 24.34& 11.82& 0.06& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=100 & 71.2& 60.56& 48.36& 35.19& 17.76& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=200 & 71.27& 61.55& 49.57& 36.45& 21.31& 0.01& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=300 & 71.3& 62.16& 50.75& 37.41& 22.66& 0.48& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=400 & 71.41& 62.76& 51.44& 37.85& 23.18& 9.12& 0.01& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=500 & 71.37& 63.0& 51.89& 38.06& 23.16& 11.37& 0.04& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=600 & 71.37& 63.36& 52.25& 38.31& 22.8& 11.84& 0.16& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=700 & 71.35& 63.45& 52.43& 38.33& 22.58& 11.61& 0.73& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=800 & 71.25& 63.65& 52.67& 38.26& 22.35& 11.17& 4.69& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=900 & 71.21& 63.85& 52.81& 38.21& 22.21& 10.75& 4.92& 0.03& 0.0& 0.0& 0.0\\
 & $\sigma = $0.50 & K=100 & 61.08& 53.0& 44.33& 35.59& 27.49& 20.09& 13.74& 7.98& 0.04& 0.0& 0.0\\
 & $\sigma = $0.50 & K=200 & 61.07& 53.41& 44.95& 36.33& 28.39& 20.86& 14.05& 8.22& 2.9& 0.0& 0.0\\
 & $\sigma = $0.50 & K=300 & 61.1& 53.8& 45.61& 37.13& 28.9& 21.5& 14.32& 8.56& 3.57& 0.0& 0.0\\
 & $\sigma = $0.50 & K=400 & 61.1& 54.14& 46.02& 37.77& 29.3& 21.94& 14.66& 8.63& 3.89& 0.06& 0.0\\
 & $\sigma = $0.50 & K=500 & 61.15& 54.21& 46.52& 38.15& 29.79& 22.21& 14.91& 8.66& 4.23& 1.05& 0.0\\
 & $\sigma = $0.50 & K=600 & 61.2& 54.33& 46.89& 38.59& 30.08& 22.35& 15.01& 8.74& 4.28& 1.56& 0.01\\
 & $\sigma = $0.50 & K=700 & 61.18& 54.56& 47.11& 38.93& 30.4& 22.51& 15.12& 8.85& 4.34& 1.77& 0.03\\
 & $\sigma = $0.50 & K=800 & 61.15& 54.72& 47.41& 39.17& 30.59& 22.56& 15.14& 8.75& 4.31& 1.85& 0.53\\
 & $\sigma = $0.50 & K=900 & 61.12& 54.78& 47.62& 39.32& 30.78& 22.64& 15.14& 8.73& 4.26& 1.94& 0.71\\
 \bottomrule
\end{tabular}\vspace{-10pt}
\end{table*}
% SmoothAdv on Imagenet
\begin{table*}[ht]
\small
\centering
\caption{\textbf{Certified accuracy per radius on ImageNet.} We compare $\text{SmoothAdv}$ against $\text{SmoothAdv-DS}$ and $\text{SmoothAdv-DS}^2$ under varying $\sigma$ and number of iterations $K$ in Algorithm \ref{alg:RS_DS}.}
\centering
\begin{tabular}{c|cc| ccccccccccc }
\toprule
        & \multicolumn{2}{c|}{\textcolor{black}{$\ell_2^r$ (ImageNet)}}  & \multirow{1}{*}{0.0} & \multirow{1}{*}{0.25} & \multirow{1}{*}{0.50}& \multirow{1}{*}{0.75}  & \multirow{1}{*}{1.00} & \multirow{1}{*}{1.50} & \multirow{1}{*}{2.0} & \multirow{1}{*}{2.5} & \multirow{1}{*}{3.0}& \multirow{1}{*}{3.50}& \multirow{1}{*}{4.0} \\
     \midrule
      \parbox[t]{2mm}{\multirow{3}{*}{\rotatebox[origin=c]{90}{SmoothAdv}}} 
    & \multicolumn{2}{c|}{$\sigma$ = 0.25} & 60.8& 57.8& 54.6& 50.4& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
    & \multicolumn{2}{c|}{$\sigma$ = 0.50} & 54.6& 52.6& 48.8& 44.6& 42.2& 35.6& 0.0& 0.0& 0.0& 0.0& 0.0\\
    & \multicolumn{2}{c|}{$\sigma$ = 1.0} & 40.6& 39.6& 38.6& 36.4& 33.6& 29.8& 25.6& 20.4& 18.0& 14.2& 0.0\\
\midrule
 \parbox[t]{2mm}{\multirow{12}{*}{\rotatebox[origin=c]{90}{SmoothAdv-DS}}}   
 & $\sigma = $0.25 & K=100 & 61.6& 59.6& 56.8& 52.6& 31.4& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=200 & 61.6& 59.8& 57.2& 52.8& 35.8& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=300 & 62.0& 60.2& 57.2& 52.8& 36.6& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=400 & 61.8& 60.4& 57.4& 53.2& 36.8& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.50 & K=100 & 55.0& 53.6& 51.2& 47.2& 45.2& 38.0& 4.8& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.50 & K=200 & 55.0& 53.8& 51.6& 48.4& 46.4& 39.2& 16.6& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.50 & K=300 & 55.4& 54.0& 51.6& 48.6& 47.0& 39.2& 18.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.50 & K=400 & 55.2& 54.0& 51.6& 48.8& 47.0& 39.0& 18.6& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $1.0 & K=100 & 41.8& 41.0& 39.4& 37.6& 35.2& 31.6& 28.0& 22.6& 19.2& 15.2& 0.8\\
 & $\sigma = $1.0 & K=200 & 42.4& 41.8& 40.2& 38.4& 36.6& 32.4& 28.8& 23.4& 19.0& 14.6& 1.2\\
 & $\sigma = $1.0 & K=300 & 42.6& 41.8& 40.4& 38.8& 36.8& 32.4& 29.2& 23.8& 19.6& 15.2& 6.2\\
 & $\sigma = $1.0 & K=400 & 42.8& 42.2& 40.8& 38.8& 37.0& 33.2& 29.0& 23.8& 19.6& 14.8& 6.2\\
 \midrule
 \parbox[t]{2mm}{\multirow{12}{*}{\rotatebox[origin=c]{90}{SmoothAdv-DS$^2$}}}     & $\sigma = $0.25 & K=100 & 62.2& 60.4& 58.8& 54.0& 27.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=200 & 62.0& 60.6& 58.6& 54.2& 27.4& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=300 & 62.0& 60.4& 58.8& 54.0& 27.4& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=400 & 61.8& 60.4& 58.8& 54.0& 27.4& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.50 & K=100 & 55.8& 54.2& 52.6& 50.4& 48.2& 43.0& 7.8& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.50 & K=200 & 55.2& 54.0& 51.8& 49.8& 47.8& 42.6& 14.2& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.50 & K=300 & 55.6& 54.0& 52.0& 49.8& 47.8& 42.6& 15.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.50 & K=400 & 55.6& 54.4& 52.2& 50.2& 48.2& 43.0& 15.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $1.0 & K=100 & 44.0& 43.0& 41.2& 40.6& 38.4& 34.6& 30.6& 25.4& 21.6& 18.6& 1.2\\
 & $\sigma = $1.0 & K=200 & 44.4& 43.2& 41.6& 40.6& 38.6& 34.8& 30.6& 25.0& 21.6& 18.4& 1.6\\
 & $\sigma = $1.0 & K=300 & 44.2& 43.0& 41.8& 41.2& 38.6& 34.6& 30.6& 25.2& 21.4& 17.8& 4.2\\
 & $\sigma = $1.0 & K=400 & 43.8& 43.0& 41.0& 40.8& 38.6& 34.6& 30.2& 25.2& 21.4& 18.2& 4.0\\
 \bottomrule
\end{tabular}\vspace{-10pt}
\end{table*}

\clearpage

\subsection{MACER vs MACER-DS vs MACER-DS$^2$ (n=1) vs MACER-DS$^2$ (n=8)}
We report $\ell_2^r$ certified accuracy per radius $r$ for MACER \citep{zhai2020macer} variants on CIFAR10. Note that as highlighted in the main manuscript, for certification only, \ie $MACER-DS$, we set $n=8$ for all experiments in Algorithm \ref{alg:RS_DS}. Moreover, in the main paper and for ease of computation we set $n=1$ for when training is employed, \ie $-DS^2$. In here we also explore the variant where when data dependent smoothing is introduced during training we set $n=8$ for ablations. We refer to when $n=1$ and $n=8$ for when data dependent smoothing is used in training and certification as $\text{MACER}-DS(n=1)$ and $\text{MACER}-DS(n=8)$, respectively.

% (Test = 8) MACER on Cifar10 n=1
\begin{table*}[t]
\footnotesize
\centering
\caption{\textbf{Certified accuracy per radius on CIFAR10.} We compare $\text{MACER}$ against $\text{MACER-DS}$ and $\text{MACER-DS}^2(n=1)$ under varying $\sigma$ and number of iterations $K$ in Algorithm \ref{alg:RS_DS}.}
\centering
\begin{tabular}{c|cc| ccccccccccc }
\toprule
        & \multicolumn{2}{c|}{\textcolor{black}{$\ell_2^r$ (CIFAR10)}}  & \multirow{1}{*}{0.0} & \multirow{1}{*}{0.25} & \multirow{1}{*}{0.50} & \multirow{1}{*}{0.75} & \multirow{1}{*}{1.00} & \multirow{1}{*}{1.25} & \multirow{1}{*}{1.50} & \multirow{1}{*}{1.75} & \multirow{1}{*}{2.00}& \multirow{1}{*}{2.25} & \multirow{1}{*}{2.50}\\
     \midrule
      \parbox[t]{2mm}{\multirow{3}{*}{\rotatebox[origin=c]{90}{MACER}}}
& \multicolumn{2}{c|}{$\sigma$ = 0.12} & 78.75& 58.51& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
& \multicolumn{2}{c|}{$\sigma$ = 0.25} & 72.51& 59.25& 43.64& 28.25& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
& \multicolumn{2}{c|}{$\sigma$ = 0.50} & 61.23& 52.52& 43.44& 34.65& 26.57& 19.39& 13.0& 7.5& 0.0& 0.0& 0.0\\
\midrule
 \parbox[t]{2mm}{\multirow{27}{*}{\rotatebox[origin=c]{90}{MACER-DS}}}    & $\sigma = $0.12 & K=100 & 79.21& 60.57& 30.95& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=200 & 79.3& 60.98& 30.18& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=300 & 79.39& 61.33& 27.9& 0.07& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=400 & 79.45& 61.27& 25.62& 10.07& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=500 & 79.48& 61.4& 23.43& 11.02& 0.01& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=600 & 79.44& 61.55& 22.22& 10.66& 0.11& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=700 & 79.5& 61.39& 21.79& 9.94& 3.82& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=800 & 79.47& 61.25& 21.83& 9.33& 5.38& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=900 & 79.48& 61.34& 21.59& 8.89& 6.02& 0.1& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=100 & 73.41& 63.59& 46.37& 27.96& 12.76& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=200 & 73.72& 65.1& 47.51& 27.19& 13.85& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=300 & 73.9& 65.63& 47.81& 26.42& 13.19& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=400 & 73.96& 66.03& 48.12& 25.14& 12.2& 4.17& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=500 & 74.0& 66.18& 47.97& 23.98& 11.01& 4.59& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=600 & 74.04& 66.41& 48.23& 23.4& 9.74& 4.23& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=700 & 74.02& 66.47& 48.18& 22.86& 8.65& 3.78& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=800 & 74.07& 66.68& 48.12& 22.58& 7.62& 3.25& 1.06& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=900 & 74.01& 66.74& 48.24& 22.37& 6.88& 2.74& 1.08& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.50 & K=100 & 62.62& 55.99& 47.65& 38.37& 28.3& 19.54& 12.75& 7.55& 0.0& 0.0& 0.0\\
 & $\sigma = $0.50 & K=200 & 63.07& 57.27& 49.54& 40.25& 29.36& 19.44& 12.35& 7.43& 3.23& 0.0& 0.0\\
 & $\sigma = $0.50 & K=300 & 63.28& 57.91& 50.46& 41.4& 30.0& 19.41& 11.99& 7.08& 3.54& 0.0& 0.0\\
 & $\sigma = $0.50 & K=400 & 63.39& 58.25& 51.18& 41.98& 30.22& 19.11& 11.69& 6.9& 3.97& 0.0& 0.0\\
 & $\sigma = $0.50 & K=500 & 63.5& 58.51& 51.51& 42.4& 30.66& 18.7& 11.13& 6.73& 3.83& 1.06& 0.0\\
 & $\sigma = $0.50 & K=600 & 63.57& 58.72& 51.83& 42.62& 30.51& 18.66& 10.85& 6.44& 3.7& 1.61& 0.0\\
 & $\sigma = $0.50 & K=700 & 63.65& 58.9& 52.06& 42.79& 30.63& 18.25& 10.57& 6.25& 3.53& 1.67& 0.0\\
 & $\sigma = $0.50 & K=800 & 63.74& 59.02& 52.19& 42.96& 30.62& 18.2& 10.18& 5.84& 3.35& 1.66& 0.45\\
 & $\sigma = $0.50 & K=900 & 63.79& 59.09& 52.28& 43.03& 30.75& 18.21& 9.89& 5.53& 3.13& 1.62& 0.52\\
 \midrule
 \parbox[t]{2mm}{\multirow{27}{*}{\rotatebox[origin=c]{90}{MACER-DS$^2$ (n=1)}}} 
  & $\sigma = $0.12 & K=100 & 79.57& 61.25& 34.66& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=200 & 79.58& 61.57& 36.29& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=300 & 79.42& 61.35& 36.21& 0.06& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=400 & 79.44& 61.1& 35.32& 12.32& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=500 & 79.2& 60.64& 34.22& 13.65& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=600 & 79.09& 60.23& 33.75& 13.25& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=700 & 78.98& 60.01& 32.89& 12.66& 1.46& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=800 & 78.85& 59.65& 32.65& 12.07& 2.24& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=900 & 78.78& 59.52& 32.3& 11.4& 2.25& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=1000 & 78.73& 59.15& 31.58& 10.63& 2.05& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=100 & 71.45& 59.44& 45.71& 30.76& 14.57& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=200 & 71.81& 60.13& 46.5& 31.3& 16.2& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=300 & 71.81& 60.13& 46.5& 31.3& 16.2& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=400 & 71.91& 60.48& 46.51& 30.83& 16.73& 5.66& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=500 & 71.84& 60.56& 46.3& 30.26& 16.43& 7.07& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=600 & 71.77& 60.38& 45.92& 29.84& 16.11& 7.14& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=700 & 71.69& 60.12& 45.66& 29.41& 15.6& 7.0& 0.04& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=800 & 71.73& 60.19& 45.41& 28.91& 15.07& 6.68& 2.15& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=900 & 71.68& 60.11& 45.14& 28.51& 14.54& 6.23& 2.34& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=1000 & 71.63& 59.98& 44.97& 28.21& 14.08& 5.9& 2.12& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.50 & K=100 & 60.96& 53.69& 44.96& 36.64& 28.13& 20.46& 14.44& 8.73& 0.01& 0.0& 0.0\\
 & $\sigma = $0.50 & K=200 & 61.37& 54.35& 46.07& 37.43& 28.55& 20.58& 14.26& 8.65& 3.6& 0.0& 0.0\\
 & $\sigma = $0.50 & K=300 & 61.52& 54.74& 46.53& 37.9& 28.91& 20.62& 14.12& 8.42& 3.9& 0.0& 0.0\\
 & $\sigma = $0.50 & K=400 & 61.42& 54.81& 46.83& 38.02& 28.98& 20.51& 13.69& 8.3& 4.27& 0.0& 0.0\\
 & $\sigma = $0.50 & K=500 & 61.39& 54.74& 47.03& 38.2& 28.85& 20.25& 13.45& 8.14& 4.16& 1.0& 0.0\\
 & $\sigma = $0.50 & K=600 & 61.44& 54.8& 46.96& 38.2& 28.83& 19.97& 13.23& 7.94& 4.1& 1.53& 0.0\\
 & $\sigma = $0.50 & K=700 & 61.35& 54.75& 46.89& 38.04& 28.7& 19.53& 12.95& 7.64& 3.98& 1.7& 0.0\\
 & $\sigma = $0.50 & K=800 & 61.24& 54.75& 46.94& 38.1& 28.49& 19.3& 12.59& 7.46& 3.9& 1.69& 0.4\\
 & $\sigma = $0.50 & K=900 & 61.25& 54.73& 46.85& 37.94& 28.19& 18.87& 12.29& 7.13& 3.69& 1.7& 0.51\\
 & $\sigma = $0.50 & K=1000 & 61.21& 54.72& 46.84& 37.87& 27.97& 18.74& 12.08& 6.82& 3.43& 1.66& 0.71\\
 \bottomrule
\end{tabular}\vspace{-10pt}
\end{table*}

% (Test = 8) MACER on Cifar10 n=8
\begin{table*}[t]
\small
\centering
\caption{\textbf{Certified accuracy per radius on CIFAR10.} We report  $\text{MACER-DS}^2(n=8)$ under varying $\sigma$ and number of iterations $K$ in Algorithm \ref{alg:RS_DS}.}
\centering
\begin{tabular}{c|cc| ccccccccccc }
\toprule
        & \multicolumn{2}{c|}{\textcolor{black}{$\ell_2^r$ (CIFAR10)}}  & \multirow{1}{*}{0.0} & \multirow{1}{*}{0.25} & \multirow{1}{*}{0.50} & \multirow{1}{*}{0.75} & \multirow{1}{*}{1.00} & \multirow{1}{*}{1.25} & \multirow{1}{*}{1.50} & \multirow{1}{*}{1.75} & \multirow{1}{*}{2.00}& \multirow{1}{*}{2.25} & \multirow{1}{*}{2.50}\\
 \midrule
 \parbox[t]{2mm}{\multirow{27}{*}{\rotatebox[origin=c]{90}{MACER-DS$^2$ (n=8)}}} 
  & $\sigma = $0.12 & K=100 & 81.9& 62.52& 29.38& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=200 & 82.16& 63.09& 29.72& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=300 & 82.2& 63.4& 28.47& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=400 & 82.21& 63.51& 26.29& 6.84& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=500 & 82.34& 63.76& 24.13& 7.62& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=600 & 82.34& 63.66& 22.6& 7.05& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=700 & 82.32& 63.84& 21.61& 6.48& 0.6& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=800 & 82.35& 63.9& 21.06& 5.4& 0.83& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=900 & 82.37& 63.9& 20.79& 4.67& 0.82& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.12 & K=1000 & 82.39& 63.89& 20.57& 3.88& 0.77& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=100 & 75.18& 64.79& 47.14& 29.31& 12.56& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=200 & 75.36& 66.23& 48.55& 28.91& 13.99& 0.0& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=300 & 75.53& 66.87& 49.24& 28.31& 14.26& 0.01& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=400 & 75.57& 67.36& 49.53& 27.35& 13.94& 3.86& 0.0& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=500 & 75.65& 67.71& 49.59& 26.24& 13.23& 4.68& 0.01& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=600 & 75.72& 67.81& 49.64& 25.46& 12.12& 4.73& 0.01& 0.0& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=700 & 75.84& 67.93& 49.85& 24.92& 11.1& 4.58& 0.01& 0.01& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=800 & 75.81& 68.08& 49.84& 24.79& 10.18& 4.33& 1.05& 0.01& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=900 & 75.87& 68.16& 49.84& 24.53& 9.38& 3.81& 1.02& 0.01& 0.0& 0.0& 0.0\\
 & $\sigma = $0.25 & K=1000 & 75.87& 68.26& 49.94& 24.27& 8.66& 3.32& 0.93& 0.01& 0.01& 0.0& 0.0\\
 & $\sigma = $0.50 & K=100 & 61.79& 55.41& 47.8& 39.03& 29.04& 20.62& 13.83& 7.92& 0.0& 0.0& 0.0\\
 & $\sigma = $0.50 & K=200 & 62.11& 56.53& 49.36& 40.68& 30.08& 20.55& 13.38& 7.84& 3.07& 0.0& 0.0\\
 & $\sigma = $0.50 & K=300 & 62.31& 57.21& 50.54& 41.6& 30.79& 20.46& 13.03& 7.56& 3.44& 0.0& 0.0\\
 & $\sigma = $0.50 & K=400 & 62.49& 57.68& 51.12& 42.08& 31.12& 20.21& 12.45& 7.34& 3.65& 0.0& 0.0\\
 & $\sigma = $0.50 & K=500 & 62.62& 57.98& 51.61& 42.41& 31.3& 20.13& 12.09& 6.94& 3.65& 0.74& 0.0\\
 & $\sigma = $0.50 & K=600 & 62.71& 58.28& 51.82& 42.85& 31.52& 19.78& 11.58& 6.54& 3.56& 1.28& 0.0\\
 & $\sigma = $0.50 & K=700 & 62.84& 58.35& 52.15& 43.04& 31.42& 19.6& 11.12& 6.33& 3.29& 1.37& 0.0\\
 & $\sigma = $0.50 & K=800 & 62.91& 58.45& 52.33& 43.29& 31.48& 19.47& 10.62& 5.95& 3.21& 1.39& 0.27\\
 & $\sigma = $0.50 & K=900 & 62.93& 58.54& 52.56& 43.44& 31.49& 19.14& 10.17& 5.73& 3.09& 1.34& 0.38\\
 & $\sigma = $0.50 & K=1000 & 63.0& 58.66& 52.67& 43.47& 31.66& 19.04& 9.85& 5.49& 2.85& 1.21& 0.4\\
 \bottomrule
\end{tabular}\vspace{-10pt}
\end{table*}

% \clearpage
% \section{Data-Dependent Smoothing for $\ell_1$ Certificates.}

% \textcolor{black}{While indeed we focused both our methodology and experiments on $\ell_2$ certificates, our methodology is extendable to any other $\ell_p$ certificate. For that regard and as per reviewer's request, we conducted experiments on $\ell_1$ certification. We leveraged the results of \cite{yang2020randomized} that derived the tightest $\ell_1$ certificate using randomized smoothing with uniform distribution $\mathcal U[-\lambda, \lambda]^d$. The certified radius in that case has the form $\mathcal R_1 = \lambda(p_A - p_B)$. We replace our objective in Equation \eqref{eq:our_objective_v2} with:
% \[
% \lambda_x^* = \text{arg}\max_\lambda \lambda \left(\mathbb E_{\epsilon\sim \mathcal{U}[-\lambda, \lambda]^d}(f_\theta^{c_A}(x+\epsilon)) - \max_{c \neq c_A} \mathbb E_{\epsilon\sim \mathcal{U}[-\lambda, \lambda]^d}(f_\theta^{c}(x+\epsilon))  \right).
% \]
% We solved our objective in an identical fashion to our Algorithm 1 with the same hyperparameters for $\lambda \in \{0.25, 0.5, 1.0 \}$ in certification on both CIFAR10 and ImageNet. Further, we combine our data-dependent smooth classifier with the memory based algorithm proposed in Section 3.5. It is worthwhile mentioning that similar to the $\ell_2$ case, the memory based algorithm did not find any overlap between the certified regions of any pair of instances.
% We report the results in Figure \ref{fig:yang} and Table \ref{tab:l1}. We observe that, similar to our extensive experiments on the $\ell_2$ certificate, our proposed memory-enhanced data-dependent smoothing yields consistent improvement in the $\ell_1$ certified accuracy. We report an improvement  of 7\% and 3\% over the state of the art certified accuracy at $\ell_1$ radius of 0.5 on CIFAR10 and ImageNet, respectively. At last, we note similar improvement to the $\ell_1$ ACR as reported in Table \ref{tab:l1}.
% }

% \input{CVPR21/tables/l1}


% \begin{figure*}[t]
%      \centering
%      \begin{subfigure}[b]{0.32\textwidth}
%          \centering
%          \includegraphics[width=\textwidth]{ICLR22/new_figures/l1_cifar10-0.25.png}
%      \end{subfigure}
%     %  \hfill
%      \begin{subfigure}[b]{0.32\textwidth}
%          \centering
%          \includegraphics[width=\textwidth]{ICLR22/new_figures/l1_cifar10-0.5.png}
%      \end{subfigure}
%     %  \hfill
%      \begin{subfigure}[b]{0.32\textwidth}
%          \centering
%          \includegraphics[width=\textwidth]{ICLR22/new_figures/l1_cifar10-1.0.png}
%      \end{subfigure}
%     % \hfill
%     \begin{subfigure}[b]{0.32\textwidth}
%          \centering
%          \includegraphics[width=\textwidth]{ICLR22/new_figures/l1_imagenet-0.25.png}
%      \end{subfigure}
%     %  \hfill
%      \begin{subfigure}[b]{0.32\textwidth}
%          \centering
%          \includegraphics[width=\textwidth]{ICLR22/new_figures/l1_imagenet-0.5.png}
%      \end{subfigure}
%     %  \hfill
%      \begin{subfigure}[b]{0.32\textwidth}
%          \centering
%          \includegraphics[width=\textwidth]{ICLR22/new_figures/l1_imagenet-1.0.png}
%      \end{subfigure}
%         \caption{
%         \textcolor{black}{ 
%         \textbf{$\ell_1$ Certified accuracy comparison against $\text{Yang}$ per radius per $\sigma$.} We compare $\text{Yang}$ against
%         % our data dependent certification
%         $\text{Yang-DS}$.
%         % and when data dependency is incorporated in both training and certification
%         % for several $\sigma$. 
%       We show CIFAR10 and ImageNet results in first and second rows, respectively. Similar to the earlier experiments on $\ell_2$ certificate, deploying data-dependent smoothing with the memory enhanced classifier yields significant improvement for the $\ell_1$ certified accuracy in all considered scenarios. }}
%         % , where the last column is the envelope.}
%         \label{fig:yang}
% \end{figure*}


% \section{Runtime}
% \textcolor{black}{We measure the certification runtime on an NVIDIA Quadro RTX-6000 GPU for our proposed data dependent smoothed classifier (time includes Algorithm \ref{alg:RS_DS} in addition to the memory based certification) compared to the certification of a fixed $\sigma$ classifier. Certifying one CIFAR10 test input with ResNet18 takes 1.6 and an average of 1.8 seconds for a fixed $\sigma$ classifier and for the data dependent classifier ($K = 900$), respectively. Certifying an ImageNet test input on ResNet50 takes 109.5 and an average of 136 seconds for a fixed $\sigma$ classifier and our data dependent classifier ($K=400$), respectively. The runtime overhead added by
% using Algorithm \ref{alg:RS_DS} and memory based certification is negligible compared to the gains in certified accuracy.}


% \section{Where are the good optimal $\sigma^*_x$?}

% \begin{figure}
%     \centering
%     \includegraphics[width=0.5\textwidth]{UAI22/rebuttal_figures/cohen_comparison_0.25.pdf}
%     \caption{\textbf{Where are the good $\sigma_x^*$?} We plot a histogram of the $\sigma_x^*$ in orange highlighting $\sigma_x^*$ at which the certified radius is improved in green.}
%     \label{fig:final-experiment}
% \end{figure}


% % \textbf{Regarding the minor issues.} We thank the reviewer for this insightful experiments. As suggested, 
% At last, one natural question that arises is that which $\sigma_x^*$ is yielding better certified radii? 
% To that regard, we conduct the following experiment for Cohen baseline at $\sigma=0.25$.
% % we conducted this experiment for Cohen baseline at $\sigma=0.25$. 
% % W
% We plot the histogram of the obtained $\sigma_x^*$ for CIFAR10 in orange. We also plot a histogram of the $\sigma_x^*$ at which the certified radius is improved in green. We report the results in Figure~\ref{fig:final-experiment}.
% We found that the certified robustness improvements happen at the full spectrum of $\sigma_x^*$ showing the efficacy of our proposed data-dependent smoothing.
% We will include a more detailed version of this experiment in the final version.

% \clearpage

% \subsection{Results for CIFAR10 - $\sigma = 1.0$.}
% We extend our evaluation on CIFAR10 to cover the case when $\sigma=1.0$. We report the results in Table \ref{tb:sigma-1.0} where again our data-dependent framework boosts the certified accuracy at different radii.
% \begin{table*}
% \scriptsize
% \centering
% \caption{\textbf{Certified Accuracy at $\sigma = 1.0$}
% % We compare the best certified accuracy and ACR 
% }
% % \BG{FS is not defined here nor in the text}\BG{the hlines in the last column where ACR is look short}\BG{what does bold mean?}}
% % per radius on both CIFAR10 and ImageNet.}
% % \vspace{-0.25cm}
% \centering
% \begin{tabular}{c|cc| cccccccccc c}
% \toprule 
% \midrule
%     \multirow{2}{*}{\textcolor{black}{CIFAR10}} & \multicolumn{2}{c|}{Radius}  & \multirow{2}{*}{0.0} & \multirow{2}{*}{0.25} & \multirow{2}{*}{0.50} & \multirow{2}{*}{0.75} & \multirow{2}{*}{1.00} & \multirow{2}{*}{1.25} & \multirow{2}{*}{1.50} & \multirow{2}{*}{1.75} & \multirow{2}{*}{2.00}& \multirow{2}{*}{2.25} &  \multirow{2}{*}{\text{ACR}}\\
%     & \text{Train} & \text{Certify} &  &&&&&& &\\
% \midrule
% \text{\textsc{Cohen}} & \text{FS} & \text{FS} & &  &  &  &  & &  &  &  &  &    \\
% \text{\textsc{Cohen}-DS} & \text{FS} & \text{DS}& &  &  &  &  &  &  &  &  &  &   \\
% \midrule
% \text{\textsc{SmoothAdv}} & \text{FS} & \text{FS} & &  &  &  &  & &  &  &  &  &    \\
% \text{\textsc{SmoothAdv}-DS} & \text{FS} & \text{DS}& &  &  &  &  &  &  &  &  &  &   \\
% % \midrule
% % \text{\textsc{MACER}} & \text{FS} & \text{FS} & &  &  &  &  & &  &  &  &  &    \\
% % \text{\textsc{MACER}-DS} & \text{FS} & \text{DS}& &  &  &  &  &  &  &  &  &  &   \\
% \midrule
% \midrule

%     \multirow{2}{*}{\textcolor{black}{ImageNet}} & \multicolumn{2}{c|}{Radius}  & \multirow{2}{*}{0.0} & \multirow{2}{*}{0.25} & \multirow{2}{*}{0.50} & \multirow{2}{*}{0.75} & \multirow{2}{*}{1.00} & \multirow{2}{*}{1.25} & \multirow{2}{*}{1.50} & \multirow{2}{*}{1.75} & \multirow{2}{*}{2.00}& \multirow{2}{*}{2.25} &  \multirow{2}{*}{\text{ACR}}\\
%     & \text{Train} & \text{Certify} &  &&&&&& &\\
% \midrule
% \text{\textsc{Cohen}} & \text{FS} & \text{FS} & &  &  &  &  & &  &  &  &  &    \\
% \text{\textsc{Cohen}-DS} & \text{FS} & \text{DS}& &  &  &  &  &  &  &  &  &  &   \\
% \midrule
% \text{\textsc{SmoothAdv}} & \text{FS} & \text{FS} & &  &  &  &  & &  &  &  &  &    \\
% \text{\textsc{SmoothAdv}-DS} & \text{FS} & \text{DS}& &  &  &  &  &  &  &  &  &  &   \\
% \bottomrule
% \end{tabular}\label{tb:sigma-1.0}
% \end{table*}



