\section{Related Work}
\label{related_work}

\textbf{Certified Defenses.} Certified defenses aim to guarantee that an adversary does not exist in a certain region around a given input. Certified defenses can be divided into exact \citep{cheng2017maximum,lomuscio2017approach,huang2017safety,ehlers2017formal} and relaxed certification \citep{salman2019convex, wong2018provable}. Generally, exact certification suffers from poor scalability with networks that are at most 3 hidden layers deep \citep{tjeng2017evaluating}. On the other hand, relaxed methods resolve this issue by aiming at finding an upper bound to the worst adversarial loss over all possible bounded perturbations around a given input \citep{weng2018towards}. However, the latter is too expensive for any mixed certification-training routine. 


\textbf{Randomized Smoothing.} %Randomized smoothing is a recent probabilistic approach to certification. 
The earliest work on randomized smoothing \citep{lecuyer2019certified} was from a differential privacy  perspective, where it was demonstrated that adding Laplacian noise enjoys an $\ell_1$ certification radius in which the average classifier prediction under this noise is constant. This work was later followed by the tight $\ell_2$ certificate radius for Gaussian smoothing \citep{cohen2019certified}. Since then, there has been a body of work on randomized smoothing with empirical defenses \citep{salman2019provably} to certify black box classifiers \citep{salman2020black}. Other works derived certification guarantees for $\ell_1$ bounded \citep{teng2019ell_1}, $\ell_\infty$ bounded \citep{zhang2019filling}, and $\ell_0$ bounded \citep{levine2020robustness} perturbations. Even more recently, a novel framework that finds the optimal smoothing distribution for a given $\ell_p$ norm \citep{yang2020randomized} was proposed showing state-of-art certification results on $\ell_1$ perturbations. We deviate from the common literature by introducing the notion of smoothing, particularly Gaussian smoothing for $\ell_2$ perturbations, which varies depending on the input. In particular, since an input $x$ that is far from the decision boundaries should tolerate larger smoothing (and equivalently have a larger certification radius) as compared to inputs closer to these boundaries, we optimize for the amount of smoothing per input (specifically $\sigma_x$) that maximizes the certification radius. This proposed process is denoted as \emph{data dependent smoothing} where we provide a procedure for certifying the resultant smooth classifier.


% \BG{similar to the intro, we have to give a clear justification of  why we expect that making it data dependent will be better}