\section{Introduction}
\label{introduction}
\blfootnote{* Equal contribution.}
\blfootnote{Correspondence to: motasem.alfarra@kaust.edu.sa, adel.bibi@eng.ox.ac.uk}
\footnote{Code: \href{https://github.com/MotasemAlfarra/Data_Dependent_Randomized_Smoothing}{https://github.com/MotasemAlfarra/DDRS.}}
Despite the success of Deep Neural Networks (DNNs) in various learning tasks \citep{krizhevsky2012imagenet,long2015fully}, they were shown to be vulnerable to small carefully crafted adversarial perturbations \citep{goodfellow2014explaining,szegedy2013intriguing}. For a DNN $f$ that correctly classifies an image $x$, $f$ can be fooled to produce an incorrect prediction for $x+\eta$ even when the adversary $\eta$ is so small that $x$ and $x+\eta$ are indistinguishable to the human eye. % Even worse, such adversaries, $\delta$, are in many cases easy to tailor with routines that are as simple as a single gradient ascent iteration of some loss function over the input \cite{goodfellow2014explaining}. This is of a critical concern particularly that DNNs are deployed in safety critical applications, \eg self driving cars, which can hinder their public trust. 
To circumvent this nuisance, there have been several works proposing heuristic training procedures to build networks that are \textit{robust} against such perturbations \citep{cisse2017parseval,madry2017towards}. However, many of these works provided a false sense of security as they were subsequently broken, \ie shown to be ineffective against stronger adversaries 
\input{figs/pull}
\citep{athalye2018obfuscated,tramer2020adaptive,uesato2018adversarial}. This has inspired researchers to develop networks that are \textit{certifiably robust}, \ie  networks that provably output constant predictions over a characterized region around every input. Among many certification methods, a probabilistic approach to certification called \textit{randomized smoothing} has demonstrated impressive state-of-the-art \textcolor{black}{certifiable robustness results \citep{cohen2019certified,lecuyer2019certified,li2018certified}.} In a nutshell, given an input $x$ and a base classifier $f$, \eg a DNN, randomized smoothing constructs a ``smooth classifier'' $g(x) = \mathbb{E}_{\epsilon\sim \mathcal{D}}\left[f(x+\epsilon)\right]$ such that, and under some choices of $\mathcal{D}$, $g(x) = g(x+\delta)~\forall \delta \in \mathcal{R}$. As such, $g$ is certifiable within the certification region $\mathcal{R}$ characterized by  $x$  and the smoothing distribution $\mathcal{D}$.  While there has been considerable progress in devising a notion of ``optimal'' smoothing distribution $\mathcal{D}$ for when $\mathcal{R}$ is characterized by an $\ell_p$ certificate \citep{yang2020randomized}, a common trait among all works in the literature is that the choice of $\mathcal{D}$ is independent from the input $x$. For example, one of the earliest works on randomized smoothing grants $\ell_2$ certificates under $\mathcal{D}=\mathcal{N}(0,\sigma^2 I)$, where $\sigma$ is a free parameter that is constant for all $x$ \citep{cohen2019certified}. That is to say, the classifier $f$ is smoothed to a classifier $g$ uniformly (same variance $\sigma^2$) over the entire input space of $x$. The choice of $\sigma$ used for certification is often set either arbitrarily or via cross validation to obtain best certification results \citep{salman2019provably}. We believe this is suboptimal and that $\sigma$ should vary with the input $x$ (data dependent), since using a fixed $\sigma$ may under-certify inputs (\ie the constructed smooth classifier $g$ produces smaller certification radii), which are far from the decision boundaries as exemplified by $x_1$ in Figure \ref{fig:pull_fig}. Moreover, this fixed $\sigma$ could be large for inputs $x$ close to the decision boundaries resulting in a smooth classifier $g$ that incorrectly classifies $x$ (refer to $x_3$ in Figure \ref{fig:pull_fig}).

In this paper, we aim to introduce more structure to the smoothing distribution $\mathcal{D}$ by rendering its parameters data dependent. That is to say, the base classifier $f$ is smoothed with a family of smoothing distributions to produce: $g(x) = \mathbb{E}_{\epsilon \sim \mathcal{N}(0,\textcolor{black}{\sigma^2_x}I)}\left[f(x+\epsilon)\right]$ 
\footnote{ The paper mainly focuses on Gaussian smoothing, but the idea holds for other parameterized distributions.}
. Note here that the variance of the Gaussian is now dependent on the data input $x$. Moreover, given that $\sigma_x$ varies with $x$, classical randomized smoothing based certification does not apply directly. We propose a simple memory-based approach to certify the resultant data dependent smooth classifier $g$. We show that \textcolor{black}{our memory-enhanced data dependent smooth classifier}
% this 
can boost certification performance of several randomized smoothing techniques. Our contributions can thus be summarized in three folds. (\textbf{i}) We propose a parameter free and generic framework that can easily turn several randomized smoothing techniques into their data dependent variants. In particular, given a network $f$ and an input $x$, we propose to optimize the smoothing distribution parameters for every $x$, \eg $\sigma^*_x$, so they maximize the certification radius. This choice of $\sigma^*_x$ is then used to smooth $f$ at $x$ and construct a smoothed classifier $g$. \textcolor{black}{Moreover, as the data dependent smooth classifier is not directly certifiable using \cite{cohen2019certified} MCMC approaches, we propose a memory-enhanced data dependent smooth classifier for certification.} (\textbf{ii}) We demonstrate the effectiveness of our \textcolor{black}{memory-enhanced data dependent smoothing}
% framework 
by showing that we can improve the certified accuracy of several models, specifically models trained with Gaussian augmentation (\textsc{Cohen}) \citep{cohen2019certified}, adversaries on the smoothed classifier (\textsc{SmoothAdv}) \citep{salman2019provably}, and radius regularization (\textsc{MACER}) \citep{zhai2020macer} \textit{without any model retraining}. We boost the certified accuracy of the best baseline by 5.4\% on CIFAR10 and by 2.8\% on ImageNet for $\ell_2$ perturbations with less than 0.5 (=127/255) ball radius. (\textbf{iii}) We show that incorporating the proposed data dependent smoothing in the training pipeline of \textsc{Cohen}, \textsc{SmoothAdv} and \textsc{MACER} can further boost results to get certified accuracies of 68.3\% on CIFAR10 and 64.2\% on ImageNet at $\ell_2$ perturbations less than 0.25.


