%\documentclass{uai2022} % for initial submission
 \documentclass[accepted]{uai2022} % after acceptance, for a revised
                                    % version; also before submission to
                                    % see how the non-anonymous paper
                                    % would look like
%% There is a class option to choose the math font
%\documentclass[mathfont=ptmx]{uai2022} % ptmx math instead of Computer
                                         % Modern (has noticable issues)
% \documentclass[mathfont=newtx]{uai2022} % newtx fonts (improves upon
                                          % ptmx; less tested, no support)
% NOTE: Only keep *one* line above as appropriate, as it will be replaced
%       automatically for papers to be published. Do not make any other
%       change above this note for an accepted version.

%% Choose your variant of English; be consistent
% \usepackage[american]{babel}
\usepackage[british]{babel}

%% Some suggested packages, as needed:
\usepackage{natbib} % has a nice set of citation styles and commands
    \bibliographystyle{plainnat}
    \renewcommand{\bibsection}{\subsubsection*{References}}
\usepackage{mathtools} % amsmath with fixes and additions
% \usepackage{siunitx} % for proper typesetting of numbers and units
\usepackage{booktabs} % commands to create good-looking tables
\usepackage{tikz} % nice language for creating drawings and diagrams

%% Provided macros
% \smaller: Because the class footnote size is essentially LaTeX's \small,
%           redefining \footnotesize, we provide the original \footnotesize
%           using this macro.
%           (Use only sparingly, e.g., in drawings, as it is quite small.)

%% Self-defined macros
\newcommand{\swap}[3][-]{#3#1#2} % just an example

% -------
% ADDED by us 
% cross-ref Latex
%\usepackage{zref-xr,zref-user}
%\zexternaldocument*{gubri_18-supp}
\usepackage{xr}
\makeatletter
\newcommand*{\addFileDependency}[1]{% argument=file name and extension
  \typeout{(#1)}
  \@addtofilelist{#1}
  \IfFileExists{#1}{}{\typeout{No file #1.}}
}
\makeatother
% add prefix to avoid issues with bibtex
\newcommand*{\myexternaldocument}[1]{%
    \externaldocument[app-]{#1}%
    \addFileDependency{#1.tex}%
    \addFileDependency{#1.aux}%
}
\myexternaldocument{gubri_18-supp}

\usepackage{amsmath}
\usepackage{amssymb}
\usepackage[super]{nth}
\usepackage{multirow}
\usepackage{booktabs} % for tables
\usepackage{makecell}
\usepackage{xcolor} % to load extra colors
\usepackage{colortbl}
\usepackage{quiver}
\usepackage{caption}
\usepackage{subcaption}
\usepackage{algorithm}
\usepackage{algorithmic}
\usepackage[capitalize,noabbrev]{cleveref}



% table column alignement with fixed width
\usepackage{array}
\newcolumntype{L}[1]{>{\raggedright\let\newline\\\arraybackslash\hspace{0pt}}m{#1}}
\newcolumntype{C}[1]{>{\centering\let\newline\\\arraybackslash\hspace{0pt}}m{#1}}
\newcolumntype{R}[1]{>{\raggedleft\let\newline\\\arraybackslash\hspace{0pt}}m{#1}}

% math ops
\DeclareMathOperator*{\argmax}{arg\,max}
\DeclareMathOperator*{\argmin}{arg\,min}
\DeclareMathOperator{\E}{\mathbb{E}}

% theorem
\usepackage{amsthm}
%\newtheorem{theorem}{Theorem}
%\theoremstyle{theorem}
%\newtheorem{postulate}{Postulate}
\theoremstyle{definition}
\newtheorem{assumption}{Assumption}
\newtheorem*{remark}{Remark}
\theoremstyle{definition}
\newtheorem{definition}{Definition}

% -------


\title{Efficient and Transferable Adversarial Examples \\ from Bayesian Neural Networks}

% The standard author block has changed for UAI 2022 to provide
% more space for long author lists and allow for complex affiliations
%
% All author information is authomatically removed by the class for the
% anonymous submission version of your paper, so you can already add your
% information below.
%
% Add authors
\author[1]{\href{mailto:<martin.gubri@uni.lu>?Subject=Your UAI 2022 paper on transferability from BNN}{Martin~Gubri}{}}
\author[1]{Maxime~Cordy}
\author[1]{Mike~Papadakis}
\author[1]{Yves~Le~Traon}
\author[2]{Koushik~Sen}
% Add affiliations after the authors
\affil[1]{%
    University of Luxembourg\\
    Luxembourg, LU
}
\affil[2]{%
    University of California\\
    Berkeley, CA, USA
}

  \begin{document}
\maketitle

\begin{abstract}
    An established way to improve the transferability of black-box evasion attacks is to craft the adversarial examples on an ensemble-based surrogate to increase diversity. We argue that transferability is fundamentally related to uncertainty. Based on a state-of-the-art Bayesian Deep Learning technique, we propose a new method to efficiently build a surrogate by sampling approximately from the posterior distribution of neural network weights, which represents the belief about the value of each parameter. Our extensive experiments on ImageNet, CIFAR-10 and MNIST show that our approach improves the success rates of four state-of-the-art attacks significantly (up to 83.2 percentage points), in both intra-architecture and inter-architecture transferability. On ImageNet, our approach can reach 94\% of success rate while reducing training computations from 11.6 to 2.4 exaflops, compared to an ensemble of independently trained DNNs. Our vanilla surrogate achieves 87.5\% of the time higher transferability than three test-time techniques designed for this purpose. Our work demonstrates that the way to train a surrogate has been overlooked, although it is an important element of transfer-based attacks. We are, therefore, the first to review the effectiveness of several training methods in increasing transferability. We provide new directions to better understand the transferability phenomenon and offer a simple but strong baseline for future work.
\end{abstract}


%--------------------- INTRO
\section{Introduction}

\begin{figure}
\centering
\includegraphics[width=\columnwidth]{figure/schema2_reduced.pdf}
\caption{Illustration of the proposed approach.}
\label{fig:schema-approach}
\end{figure}

Deep Neural Networks (DNNs) have caught a lot of attention in recent years thanks to their capability to solve efficiently various tasks, especially in computer vision \citep{Dargan2019ALearning}. However, a common pitfall of these models is that they are vulnerable to adversarial examples, i.e., misclassified examples that result from slightly altering a well-classified example at test time \citep{Biggio2013, Szegedy2013}. 
This constitutes a critical security threat, as a malicious third party may exploit this property to enforce some desired outcome.

Such \emph{adversarial attacks} have been primarily designed in white-box settings, where the attacker is assumed to have complete knowledge of the target DNN (including its weights). While studying such worst-case scenarios is essential for proper security assessment, in practice the attacker should have limited %, or even no 
knowledge of the target model. In such a case, the adversarial attack is applied to a surrogate model, with the hope that the crafted adversarial examples \emph{transfer to} (i.e., are also misclassified by) the target DNN.

Achieving transferability remains challenging, though. This is because adversarial attacks were designed to optimize the loss function of a specific model \citep{Goodfellow2014ExplainingExamples, Kurakin2019AdversarialWorld}, different from that of the target model.
As a result, \citet{Liu2017DelvingAttacks} improved transferability by attacking an \emph{ensemble} of architectures. The key intuition is that adversarial examples that fool a diverse set of models are more likely to generalize. While ensemble-based attacks typically report significantly higher success rates than their single-model counterparts, their computational cost is prohibitive due to the necessity to independently train several surrogate models (to form a diverse ensemble).

In this paper, we analyse the unknown target model with a probabilistic eye, and relate transfer-based attacks to uncertainty. We propose a new method to improve the transferability of adversarial examples using approximate Bayesian inference to build a surrogate -- and do so with less computation overhead compared to ensemble-based methods. Our approach, shown in Figure \ref{fig:schema-approach}, leans upon recent results in Bayesian Deep Learning. More precisely, we train our surrogate with a cyclical variant of Stochastic Gradient Markov Chain Monte Carlo (i.e., \emph{cSGLD} \citep{Zhang2020CyclicalLearning}) to sample from the posterior distribution of neural network weights. We then perform efficient approximate Bayesian model averaging during the attack with minimal modifications of the attack algorithms. 

We evaluate our approach on the ImageNet, the CIFAR-10 and the MNIST datasets with a variety of DNN architectures, four adversarial attacks, and three test-time transformations. Overall, our results indicate that applying cSGLD significantly improves the success rate compared to training single DNNs and outperforms classical ensemble-based attacks in terms of computation cost. Deep Ensemble requires at least 2.51 times more flops to achieve the same success rates as cSGLD when the targeted architecture is known. This can represent, on ImageNet, a saving of 3.56 exaflops (2.36 vs 5.92). At constant computation costs, our method increases the intra-architecture transfer success rates between 1.6 and 82.0 percentage points and the inter-architecture transfer success rates between -2.3 and 83.2. cSGLD always raises the effectiveness of test-time techniques designed for transferability between 3.8 and 56.2 percentage points. Applied alone, it is more effective than these techniques applied to a single DNN in 105/120 cases. 

To summarize, our contributions are:
\begin{itemize}
    \item We relate uncertainty and transferability of adversarial examples with a Bayesian perspective. The posterior distribution represents a belief about the unknown target model.
    \item We propose the first method based on a Bayesian Deep Learning technique to generate transferable adversarial examples. Existing iterative attacks can be easily modified to perform approximate Bayesian model averaging at no additional computational cost.
    \item We pave the way for improving surrogates at train-time by evaluating six Bayesian and ensemble techniques. cSGLD is a strong competitor, though other techniques open promising avenues.
    \item We advocate the use of a new metric, T-DEE, to compare the effectiveness of transferability techniques with the strong baseline of Deep Ensemble.
    \item Our evaluation on ImageNet, CIFAR-10 and MNIST reveals significant improvements over the single-DNN and Deep Ensemble baselines in diverse experimental settings. Our train-time method improves existing test-time techniques, and is better in most cases on a competitive basis. We open new ways to understand transferability. 
\end{itemize}

%----------------------------------background

\section{Background and Related work}
\label{sec:related}

\paragraph{Adversarial attacks.} We consider 4 gradient-based attacks, which aims to maximise the prediction loss $L(x, y, \theta)$ with a $p$-norm constraint: $ \argmax_{\| \delta \|_p \leq \varepsilon} L(x + \delta, y, \theta) $. FGSM \citep{Goodfellow2014ExplainingExamples} is a L$\infty$ single-step attack defined by $\delta_{\text{FGSM}} = \varepsilon \, \text{sign}(\nabla_x L(x, y, \theta)) $. Its L$2$ equivalent is: $\delta_{\text{FGM}} = \varepsilon \, \frac{\nabla_x L(x, y, \theta)}{ \| \nabla_x L(x, y, \theta) \|_2 } $. The adversarial example is then clipped in $[0,1]$. I\nobreakdash-FGSM \citep{Kurakin2019AdversarialWorld} applies iteratively FGSM with a small step-size $\alpha$: $\delta_0 = 0$ and $ \delta_{i+1} = \text{proj}_{B_\varepsilon}(\delta_i + \alpha \, \text{sign}(\nabla_x L(x, y, \theta))) $, where $\text{proj}_{B_\varepsilon}(\bullet)$ projects the perturbation in the L$p$ ball of radius $\varepsilon$. The L$2$ variant is derived similarly. MI-FGSM attack \citep{Dong2018} adds a momentum term with decay factor $\mu$ to the previous attack: $g_{i+1} = \mu \, g_t + \frac{\nabla_x L(x, y, \theta)}{ \| \nabla_x L(x, y, \theta) \|_1 }$ and $ \delta_{i+1} = \text{proj}_{B_\varepsilon}(\delta_i + \alpha \, \text{sign}(g_{t+1})) $. PGD \citep{Madry2018TowardsAttacks} adds random restarts to I-FGSM and $\delta_0$ is sampled uniformly inside the ball $B_\varepsilon$. \cref{app-fig:relation-attacks} in \cref{app-sec:xp-setup-appendix} illustrates their relations. 

\paragraph{Ensemble surrogate.} \citet{Liu2017DelvingAttacks} show the benefit of ensembling architectures for inter-architecture transfer-based black-box attacks.
Our work leans on theirs and complements it by demonstrating that attacking models sampled with cSGLD (performing Bayesian model averaging on a unique architecture) achieves better transferability at lesser computation cost. 

\paragraph{Input and model transformations.} Other approaches have been developed to 
%Several plug-in techniques to existing attacks have been developed in the literature to 
improve transferability of adversarial examples. They transform the model or the input at test time (i.e., after training, when performing the attack). \textit{Ghost Networks} (GN) \citep{Li2018LearningNetworks} use Dropout and Skip Connection Erosion to generate on-the-fly diverse sets of surrogate models from one or more base models. \textit{Input Diversity} (DI) \citep{Xie2019} applies random transformations (random resize followed by random padding) to the input images at each attack iteration. \textit{Skip Gradient Method} (SGM) \citep{Wu2020SkipResNets} favours the gradients from skip connections rather than residual modules through a decay factor applied to the latter during the backward pass. 
These techniques can naturally be combined to ours: (i) cSGLD can provide at a low computation cost a diverse set of base models to build GN;
(ii) DI applies transformations to adversarial inputs independently of the surrogate models;
(iii) SGM modifies backward passes during the attack, independently of the training method.
As our evaluation will reveal, our train-time method further improves the transferability of the above three techniques and outperforms them 87.5\% of the time. It is also compatible with other test-time approaches not considered in this paper, such as 
linear backpropagation \citep{Guo2020a}, intermediate level attack \citep{Huang2019EnhancingAttack}, Nesterov accelerated gradient and scale invariance \citep{Lin2020NesterovAttacks}, and serial mini-batch ensemble attack \citep{Che2020AMemories.}.


\paragraph{Bayesian Neural Network (BNN) and adversarial examples.}
Though not our goal, past research aimed at generating adversarial examples for BNNs (we rather use Bayesian Deep Learning as a way to attack classical DNNs). \citet{Grosse2018} show that BNN uncertainty measures are vulnerable to high-confidence-low-uncertainty adversarial examples crafted on Gaussian Processes. \citet{Palacci2018ScalablePractice} show that several SG-MCMC sampling schemes are not secure against white-box attacks. \citet{Wang2018} use SGLD and Generative Adversarial Network to detect adversarial examples instead of crafting them.



\citet{Carbone2020RobustnessAttacks} claim that BNNs are robust against gradient-based attacks because gradients vanish in expectation under the true posterior distribution. Their conclusions hold theoretically under the restrictive assumption of the large-data overparametrized limit, and experimentally for HMC and VI on MNIST and Fashion MNIST. In \cref{app-sec:vanishing-grads-appendix}, our experiments reveal opposite conclusions about cSGLD: our surrogates DNNs suffer more often from vanished gradients than our cSGLD surrogates. On MNIST, we observe that 60.6-86.6\% of individual gradients of HMC or VI vanish before averaging them. Therefore, the theoretical development of \citet{Carbone2020RobustnessAttacks} does not seem to explain most gradient vanishing. Furthermore, VI on larger datasets (ImageNet and CIFAR-10) do not suffer from vanishing gradients.
%We do not observe an averaging effect for HMC (and a slight one for VI) indicating that vanishing gradients might come from another effect during training. 


%---------------------------------- approach

\section{Approach}
\label{sec:approach}

\noindent\textbf{A Bayesian perspective on transferability.} Under a specified threat model, we relate uncertainty and posterior predictive distributions to transferability. We consider a classification problem with a training dataset $\mathcal{D} = \{ (x_i, y_i) \sim p(x, y) \}_{i=1}^N$ and $C$ class labels. A probabilistic classifier parametrized by $\theta$ maps $x_i$ into a predictive distribution $\hat p(y | x_i, \theta)$. A white-box adversarial perturbation of a test example $(x,y) \sim p(x, y)$ against such classifier is defined as:
$$\delta_\theta \in \argmin_{\| \delta \|_p \leq \varepsilon} \hat p(y | x+\delta, \theta).$$
In practice, this optimization problem is solved by replacing the predictive distribution with a loss function (see \cref{sec:related}). 
The \textit{transferability} phenomenon is the empirical observation that an adversarial example for one model is likely to be adversarial for another one \citep{Goodfellow2014ExplainingExamples}. 
Black-box attacks can leverage this property by crafting adversarial examples using white-box attacks against a surrogate model to target an unseen model \citep{Papernot2016}.

\begin{assumption}[Threat model]
\label{assum:threat-model}
We define our threat model with the following assumptions on the targeted classifier:
\begin{enumerate}
    \item Its architecture is known and so is its prediction function $\hat p(y | x, \bullet)$ \footnote{We discuss the unknown architecture case further on.}. % or rename: prediction function (incl. architecture)
    \item Its training set $\mathcal{D}$ is known.
    \item Its parameters $\theta_t$, estimated by maximum likelihood, are unknown.
    \item A reasonable prior on its parameters $p(\theta_t)$ is known\footnote{In practice, it corresponds to knowing the weight decay hyperparameter, see discussion below.}.
    \item No oracle access (test-time feedback) is possible.
\end{enumerate}
\end{assumption}

Assuming the threat model described in Assumption~\ref{assum:threat-model}, \textit{uncertainty on target parameters arises from the stochastic nature of training}, and more specifically from two sources of randomness: (i)~every Stochastic Gradient Descent (SGD) update depends on a random batch of training examples\footnote{The same argument holds for SGD variants.}, (ii)~weights are randomly initialized at the beginning of training\footnote{Despite being independent and identically distributed random variables, weights initialization values play an important role in guiding the SGD trajectory \citep{Frankle2019TheNetworks}.}. From the attacker subjective view, the target parameters obtained at the end of training are random variables. 

We argue that $\theta_t$ is approximately distributed according to the posterior distribution $p(\theta | \mathcal{D})$. \citet{Mingard2020IsAlmost} observes a strong correlation between the probability to obtain with SGD or its variants a function consistent with a training set and the Bayesian posterior probability of this function. \citet{Mandt2017StochasticInference} shows that SGD with constant learning rate has a stationary distribution centred on an optimum, which approximates a posterior. Marginalizing over local optima, we obtain a posterior that is the distribution of SGD endpoints with a step decay learning rate schedule (as widely used). 

Then, \textit{the best transferable adversarial example approximately minimizes the Bayesian posterior predictive distribution} $p(y | x, \mathcal{D}) =  \E_{p(\theta | \mathcal{D})} \hat p(y | x, \theta)$
and our black-box attack objective is:

\begin{equation}\label{eq:delta-star}
\delta^* \in \argmin_{\| \delta \|_p \leq \varepsilon} \E_{\theta_t \sim p(\theta  | \mathcal{D})} \hat p(y | x+\delta, \theta_t).
\end{equation}


Usually in adversarial machine learning, transferable adversarial examples are optimized against one surrogate model. This is similar to solving problem (\ref{eq:delta-star}) deterministically by approximating the expectation of the posterior predictive with a ``plug-in'' estimation of the parameters, $\hat \theta_\text{MAP}$ the maximum a posteriori probability (MAP) estimate: $\delta^* \approx \delta_{\hat \theta_\text{MAP}}$. To avoid overfitting to the surrogate model, random transformations of inputs or prediction functions were developed in the literature (see \cref{sec:related}).

A fundamental issue is that the closed form of the posterior predictive distribution is intractable for DNNs. Our contribution lies in \textit{sampling from the posterior distribution to build a surrogate in black-box adversarial attacks}. We replace the crude MAP approximation of the posterior predictive distribution with a more accurate one to generate transferable adversarial examples. Therefore, we focus on the training phase by considering the methods and the computational costs of obtaining the surrogate model, whereas most previous work searches to optimize adversarial examples crafting at the time of the attack (``test-time'').

\textbf{SG-MCMC \& cSGLD.} In practice, we perform Bayesian model averaging using samples obtained from Stochastic Gradient-Markov Chain Monte Carlo (SG-MCMC). 
SG-MCMC is a family of approximate Bayesian inference techniques, inaugurated by SGLD \citep{Welling2011a}, that combines SGD with MCMC. Adding noise during training allows to sample from the posterior distribution of parameters. The  empirical distribution of the samples approximates the posterior. Then, our method aims to solve the following optimization problem:
\begin{equation}\label{eq:delta-star-bayesian}
\delta_{\{\theta_s\}} \in \argmin_{\| \delta \|_p \leq \varepsilon} \frac{1}{S} \sum_{s=1}^S \hat p(y | x+\delta, \theta_s) \;,
\end{equation}
where $\{\theta_s \sim p(\theta | \mathcal{D})\}_{s=1}^S$ are samples of the posterior.

We choose to apply the recently proposed \textit{cyclical Stochastic Gradient Langevin Dynamics} (cSGLD) \citep{Zhang2020CyclicalLearning}, a state-of-the-art SG-MCMC technique. cSGLD performs warm restarts by dividing the training into cycles that all start from the initial learning rate value (cf. illustration in \cref{app-sec:xp-setup-appendix}). Each cycle consists of (1)~an exploration stage with larger learning rates which corresponds to the burn-in period of MCMC algorithms; (2)~a sampling stage that samples parameters at regular intervals and operates with smaller learning rates and added noise. Starting a new cycle with a large learning rate allows the exploration of another local maximum of the loss landscape. Contrary to most SG-MCMC methods, cSGLD has the compelling benefit of sampling from both several modes of the posterior distributions and locally inside each mode, avoiding mode collapse. Another major advantage of cSGLD is that its computation overhead compared to SGD/Adam is negligible (0.019\% flops for one epoch on PreResNet110 on CIFAR-10 and 0.015\% for ResNet50 on ImageNet).

\textbf{Difference with Ensembling.} Our work differs from previous research \citep{Liu2017DelvingAttacks,Li2018LearningNetworks,Xie2019} that relates diversity with transferability in the same way that Ensembling and Bayesian Model Averaging do \citep{MinkaThomasP.2002BayesianCombination}. The latter ``assumes that the true model lies within the hypothesis class of the prior, and performs soft model selection [...]. In contrast, ensembles [...] combine the models to obtain a more powerful model; ensembles can be expected to be better when the true model does not lie within the hypothesis class" \citep{Lakshminarayanan2016SimpleEnsembles}. Under Assumptions \ref{assum:threat-model}, the unknown target model, our true model here, lies within the hypothesis class of its prior by definition. Therefore, we argue that under these conditions, a Bayesian approach is a more natural way to select a surrogate model.

\textbf{Target prior.} We express the prior of a standard target DNN. Deterministic DNNs are classically trained using the cross-entropy loss regularized by weight decay:

% \min_{\theta_t} \mathcal{L}(\theta_t)
$$ \min_{\theta_t} - \dfrac{1}{N} \sum_{i=1}^N \log \hat p(y_i|x_i, \theta_t) + \dfrac{\lambda}{2}\| \theta_t \|^2 , $$

with $\lambda$ its weight decay hyperparameter. This maximum likelihood estimation (MLE) procedure corresponds to the maximum a posteriori inference (MAP) of this implied probabilistic model:

$$ p(y, \theta_t | x) = p(y | x, \theta_t) p(\theta_t), $$

where $p(y | x, \theta_t)$ is the likelihood function and $p(\theta_t) = \mathcal N (\theta_t| 0, \frac{1}{N\lambda} I)$ a Gaussian prior. Therefore, in this standard setting, the hypothesis 4 reduces to knowing the weight decay hyperparameter $\lambda$.

\textbf{Extension to unknown architecture.} Let $\mathcal{A}  = \{a_i\}_i$ be a countable set of candidate architectures, $p(a)$ a prior on $\mathcal{A}$, %\footnote{The prior can reflect the research and industry practices or be non-informative.}, 
$\theta^a $ the parameters of the architecture $a$ and $\hat p^a(y|x,\theta^a)$ its predictive distribution. Discarding hypothesis 1 of Assumption~\ref{assum:threat-model} on the knowledge of the architecture, the architecture of the target $a$ becomes a random variable. We perform \textit{Bayesian Model Comparison} to compute the posterior over models:

\begin{equation}\label{eq:model_posterior-interarch}
p(a|\mathcal{D}) \propto p(\mathcal{D}|a) p(a) .
\end{equation}

We marginalize over architectures to express the complete posterior predictive distribution as the average across architectures weighted by their posterior probabilities:

\begin{equation}\label{eq:pred_posterior-interarch}
\begin{aligned}
    p(y | x, \mathcal{D}) =& \sum_{a \in \mathcal{A}} p(a | \mathcal{D}) p(y | x, \mathcal{D}, a) \\
                          \propto & \E_{p(a)} p(\mathcal{D} | a) \E_{p(\theta^a  | \mathcal{D})} \hat p^a(y | x, \theta^a) 
    %p(y | x, \mathcal{D}, a).
\end{aligned}
\end{equation}

If $\mathcal{A}$ is finite and small, we can approximate this quantity with a weighted average of one cSGLD empirical posterior predictive distribution per architecture. Otherwise, we estimate it with MCMC by sampling according to $p(a)$ a finite subset $A = \{ a_i \sim p(a) \}_{i=1}^{S_A} \subset \mathcal{A}$ of architectures, where the number of architectures $S_A$ is fixed by the computational budget. We sample $S$ parameters $\{ \theta^a_s \}_{s=1}^S$ for all $a \in A$. Then, our inter-architecture attack that minimizes our approximation of $p(y | x, \mathcal{D})$ becomes:

\begin{equation}\label{eq:delta-star-interarch}
\delta_A \in \argmin_{\| \delta \|_p \leq \varepsilon} \frac{1}{S_A S} \sum_{a \in A} p(\mathcal{D} | a) \sum_{s=1}^S \hat p^a(y | x+\delta, \theta^a_s)
\end{equation}

Various methods exist to approximate model evidence \citep{Friel2012EstimatingReview}. To simplify empirical conclusions, we assume that all architectures in $\mathcal{A}$ have approximately equal evidence. This strong assumption is reasonable here, since we select widely used architectures which are well-specified on the standard benchmark datasets evaluated. For fairness to ensemble baselines, our experiments on unknown architectures do not include the target architecture in the set $\mathcal{A}$. 

\textbf{Attack algorithm.} One can approximate the solution of Equations \ref{eq:delta-star-bayesian} and \ref{eq:delta-star-interarch} with minor modifications of existing adversarial attack algorithms, i.e. simply cycling surrogate models throughout iterations. 
To efficiently approximate Bayesian model averaging during iterative attacks, we compute the gradient of every iteration on a single model sample per architecture. If multiple architectures are attacked, we average their gradients (see \cref{app-alg:attack-variant} in \cref{app-sec:xp-setup-appendix}). The cost of iterative attacks, measured as the number of backward passes, does not increase with the number of samples $S$.

\textbf{Clarifications.} In the following, the intra-architecture transferability represents the case of known target architecture. The mass of the prior concentrates on a single architecture, thus the posterior too. Respectively, the inter-architecture transferability corresponds to an unknown target architecture not sampled in the surrogate set. The prior of the target architecture may not be zero, given the extension to unknown architecture described above. But we hold-out this architecture from the surrogate set during empirical evaluation for fairness to baseline and to simplify result interpretations.


%---------------------------------- results

\section{Experiments}

The goal of our approach is to increase the transferability of adversarial examples by using a surrogate sampled from the posterior distribution to attack a deterministic DNN.

\textbf{Setup summary.} The target models are deterministic DNNs and are never used as a surrogate. For a fair comparison between DNNs and cSGLD, we train the surrogate DNNs on CIFAR-10 and MNIST using the same process as the target models. ImageNet targets are third-party pretrained models. Each cSGLD cycle lasts 50 epochs and samples 5 models on CIFAR-10, 10 epochs/4 models on MNIST, 45 epochs/3 models on ImageNet. We report the success rate (misclassification rate of untargeted adversarial examples) averaged over three attack runs. We craft adversarial examples from correctly predicted test examples (all examples for CIFAR-10 and MNIST, and a random subset of 5000 examples for ImageNet). The iterative attacks (I-FGSM, MI-FGSM, and PGD) perform 50 iterations such that the transferability rates plateaus (\cref{app-sec:hp-appendix}). Each attack computes the gradient of one model per architecture. Therefore, their computation cost and volatile memory are not multiplied by the size of the surrogate, except for FGSM which computes its unique gradient against all available models. The source code is publicly available\footnote{\url{https://github.com/Framartin/transferable-bnn-adv-ex}}. \cref{app-sec:xp-setup-appendix} presents the experimental setup in details.
%URL retracted for anonymous submission

\subsection{Intra-architecture Transferability}
\label{sec:rq1}


\begin{table}[t]
\centering
\caption{Number of DNNs (T-DEE) and training computation budget (in flops) to achieve the  intra-architecture transferability of cSGLD with Deep Ensemble. Higher is better. ``\textgreater{}15'' means that 15 DNNs always transfer less than cSGLD.}
%\resizebox{.99\columnwidth}{!}{
\begin{tabular}{ccl|rR{3.1em}}
\toprule
\bfseries Dataset & \bfseries Attack                     & \bfseries Norm & \bfseries T-DEE & \bfseries Flops Ratio \\
\midrule
\multirow{8}{*}{ImageNet} & \multirow{2}{*}{I-FGSM} & L2  & 4.91 \tiny ±0.11        & 2.84 \tiny ±0.06            \\
        &                            & L$\infty$            & 4.34 \tiny ±0.13        & 2.51 \tiny ±0.08               \\ \cline{2-5} 
        & \multirow{2}{*}{MI-FGSM} & L2                   & 4.69 \tiny ±0.18        & 2.71 \tiny ±0.10               \\ 
        &                            & L$\infty$            & 4.38 \tiny ±0.03        & 2.53 \tiny ±0.02               \\ \cline{2-5} 
        & \multirow{2}{*}{PGD}       & L2                   & 5.00 \tiny ±0.11        & 2.89 \tiny ±0.06               \\  
        &                            & L$\infty$            & 4.42 \tiny ±0.16        & 2.56 \tiny ±0.09               \\ \cline{2-5} 
        & \multirow{2}{*}{FGSM}    & L2                   & 5.81 \tiny ±0.34        & 3.35 \tiny ±0.19               \\ 
        &                            & L$\infty$            & 5.98 \tiny ±0.03        & 3.46 \tiny ±0.02               \\ \hline
\multirow{8}{*}{CIFAR10} & \multirow{2}{*}{I-FGSM} & L2 & \textgreater{}15 \tiny ±nan & \textgreater{}15 \tiny ±nan \\
        &                            & L$\infty$ & 3.76 \tiny ±0.08              & 3.76 \tiny ±0.08               \\ \cline{2-5}
        & \multirow{2}{*}{MI-FGSM} & L2        & 5.56 \tiny ±0.80              & 5.56 \tiny ±0.80               \\
        &                            & L$\infty$ & 2.88 \tiny ±0.03              & 2.87 \tiny ±0.03               \\ \cline{2-5} 
        & \multirow{2}{*}{PGD}       & L2        & \textgreater{}15 \tiny ±nan   & \textgreater{}15 \tiny ±nan    \\
        &                            & L$\infty$ & 3.74 \tiny ±0.12              & 3.74 \tiny ±0.12               \\ \cline{2-5} 
        & \multirow{2}{*}{FGSM}    & L2 & \textgreater{}15 \tiny ±nan & \textgreater{}15 \tiny ±nan \\ 
        &                            & L$\infty$ & 8.72 \tiny ±0.01             & 8.72 \tiny ±0.01               \\ \hline
\multirow{8}{*}{MNIST}  & \multirow{2}{*}{I-FGSM}  & L2 & \textgreater{}15 \tiny ±nan & \textgreater{}15 \tiny ±nan \\ % FC
             &                            & L$\infty$ & 3.42 \tiny ±0.17            & 3.42 \tiny ±0.17            \\ \cline{2-5} 
                             & \multirow{2}{*}{MI-FGSM} & L2 & \textgreater{}15 \tiny ±nan & \textgreater{}15 \tiny ±nan \\
             &                            & L$\infty$ & 2.79 \tiny ±0.07            & 2.79 \tiny ±0.07            \\ \cline{2-5} 
             & \multirow{2}{*}{PGD}       & L2   & \textgreater{}15 \tiny ±nan & \textgreater{}15 \tiny ±nan \\
             &                            & L$\infty$ & 3.26 \tiny ±0.28            & 3.26 \tiny ±0.28            \\ \cline{2-5} 
             & \multirow{2}{*}{FGSM}    & L2   & \textgreater{}15 \tiny ±nan & \textgreater{}15 \tiny ±nan \\
             &                            & L$\infty$ & \textgreater{}15 \tiny ±nan & \textgreater{}15 \tiny ±nan \\
\bottomrule
\end{tabular}
%}
\label{tab:dee-same-arch}
\end{table}


\begin{table*}[t!]
\caption{Transfer success rates of I-FGSM attack on ImageNet hold-out architectures. Higher is better.}
\centering
\resizebox{1.99\columnwidth}{!}{
\begin{tabular}{llrrrrrL{3.1em}}
\toprule
\multicolumn{2}{c}{ } & \multicolumn{5}{c}{\bfseries Target Architecture} \\
\cmidrule(l{3pt}r{3pt}){3-7}
   \bfseries Norm  & \bfseries Surrogate &  $-$ResNet50 &  $-$ResNeXt50 &  $-$DenseNet121 &  $-$MNASNet &  $-$EffNetB0 & \bfseries Nb epochs \\
\midrule
\multirow{2}{*}{L2} & 1 cSGLD per arch. &   \textbf{ 93.28 \tiny ±0.12} &  \textbf{90.61 \tiny ±0.24} &  \textbf{92.25 \tiny ±0.26} &  \textbf{95.98 \tiny ±0.19} &     \textbf{81.88 \tiny ±0.38} & $4\times135$ \\
     & 1 DNN per arch. &    72.99 \tiny ±0.52 &  72.31 \tiny ±0.44 &  64.72 \tiny ±0.59 &  84.21 \tiny ±0.18 &     53.99 \tiny ±0.76 & $4\times135$ \\
\hline
\multirow{2}{*}{L$\infty$} & 1 cSGLD per arch. &    \textbf{92.21 \tiny ±0.23} &  \textbf{89.83 \tiny ±0.22} &  \textbf{90.86 \tiny ±0.19} &  \textbf{95.85 \tiny ±0.46} &     \textbf{79.40 \tiny ±0.42} & $4\times135$ \\
     & 1 DNN per arch. &    69.65 \tiny ±0.47 &  69.01 \tiny ±0.70 &  61.00 \tiny ±0.66 &  82.25 \tiny ±0.03 &     49.71 \tiny ±1.37 & $4\times135$ \\
\bottomrule
\end{tabular}
}
\label{tab:transfer-multi-archs-imagenet}
\end{table*}

\begin{table*}[t!]
\caption{Transfer success rates of I-FGSM attack on CIFAR-10 hold-out architectures. The $\star$ symbol indicates that 1 DNN per architecture is better than 1 cSGLD per architecture. Higher is better.}
\centering
%\resizebox{1.99\columnwidth}{!}{
\begin{tabular}{llrrrrrL{3.6em}}
\toprule
\multicolumn{2}{c}{ } & \multicolumn{5}{c}{\bfseries Target Architecture} \\
\cmidrule(l{3pt}r{3pt}){3-7}
\bfseries Norm & \bfseries Surrogate &  $-$PResNet110 &  $-$PResNet164 &  $-$VGG16 &  $-$VGG19 &  $-$WideResNet & \bfseries Nb epochs \\
\midrule
\multirow{3}{*}{L$2$} & 1 cSGLD per arch. &  \textbf{95.56 \tiny ±0.04} &  \textbf{95.72 \tiny ±0.06} &  \textbf{45.96 \tiny ±0.07} &  \textbf{42.60 \tiny ±0.08} &  \textbf{84.04 \tiny ±0.05}  &$4\times300$ \\
                    & 1 DNN per arch.   &  60.38 \tiny ±1.09 &  60.93 \tiny ±1.06 &  29.97 \tiny ±0.48 &  27.57 \tiny ±0.66 &  57.86 \tiny ±0.74 & $4\times300$ \\
                    & 4 DNNs per arch.  &  77.12 \tiny ±1.32 &  77.21 \tiny ±1.14 &  40.89 \tiny ±0.63 &  40.18 \tiny ±0.76 &  77.54 \tiny ±0.93 & $4\times1200$ \\
\cline{1-8}
\multirow{3}{*}{L$\infty$}  & 1 cSGLD per arch. &  96.38 \tiny ±0.06 &  96.51 \tiny ±0.08 &  49.19 \tiny ±0.06 &  45.17 \tiny ±0.03 &  84.75 \tiny ±0.01 & $4\times300$ \\
                            & 1 DNN per arch.   &  87.02 \tiny ±0.04 &  88.86 \tiny ±0.04 &  44.99 \tiny ±0.10 &  $\star$45.55 \tiny ±0.02 &  74.84 \tiny ±0.03 & $4\times300$ \\
                            & 4 DNNs per arch.  &  \textbf{96.50 \tiny ±0.01} &  \textbf{97.01 \tiny ±0.02} &  \textbf{59.80 \tiny ±0.01} &  \textbf{59.08 \tiny ±0.01} &  \textbf{89.23 \tiny ±0.04} & $4\times1200$ \\
\bottomrule
\end{tabular}
%}
\label{tab:transfer-multi-archs}
\end{table*}




Since SG-MCMC methods sample the weights of a given architecture, we expect our approach to work particularly well in settings where the architecture of the target model is known, but not its weights. 
To demonstrate this, we compare the intra-architecture transfer success rates of cSGLD with the ones of Deep Ensemble surrogates (using 1 up to 15 independently trained DNNs). 
Architectures are ResNet-50 (ImageNet), PreResNet110 (CIFAR-10) and fully connected 1200-1200 (MNIST). 

\cref{app-sec:intra-arch-appendix} provides the detailed results for four classical gradient-based attacks on the three datasets. 
In summary, for a similar computation cost on ImageNet and CIFAR-10, cSGLD systematically increases the success rate of iterative attacks by 13.8 (ImageNet, MI-FGSM, L$\infty$) to 49.2 (CIFAR-10, I-FGSM, L2) percentage points, and of FGSM by 12.18 to 22.2. On MNIST, it ranges from 6.8 to 80.5. One explanation for the highest improvements is that DNN-based L2 norm attacks suffer from vanishing gradients on CIFAR-10 and MNIST, whereas cSGLD avoids it thanks to fast convergence and warm restarts (cf. \cref{app-sec:vanishing-grads-appendix} for proportions of vanished gradients).

Inspired by DEE \citep{Ashukha2020PitfallsLearning}, we propose the \textbf{Transferability-Deep Ensemble Equivalent (T-DEE)} metric as the number of independently trained DNNs needed to achieve the same success rate as the technique considered (computed with linear interpolation). Under some assumptions\footnote{Besides Assumptions~\ref{assum:threat-model}, we suppose that Deep Ensemble uses the target optimizer, and that the minimum in Eq.~\ref{eq:delta-star-bayesian} is reached, i.e., that the attack doesn't fail due to vanished or obfuscated gradients \citep{Athalye2018ObfuscatedExamples}.}, Deep Ensemble samples exactly from the distribution of target parameters, and is thus optimal for intra-architecture transferability with infinite computing power.

Table \ref{tab:dee-same-arch} reports the T-DEE and the \emph{computing ratio}, i.e., the total number of flops to train such DNNs ensemble divided by the number of flops used to trained cSGLD. This ratio represents the computational gain factor achieved by our approach\footnote{The ImageNet computing ratios don't equal to T-DEE since 1 DNN is trained for 130 epochs and cSGLD for 225.}.
In the worst case across the three datasets, an ensemble of 3 surrogate DNNs is required to beat the cSGLD surrogate, while requiring at least $2.51$ times more flops during the training phase. On CIFAR-10 and MNIST and considering L$2$ attack specifically (MI-FGSM CIFAR-10 aside), it even outperforms the ensemble of 15 DNNs by a significant factor (up to 71.2 percentage points). On ImageNet, cSGLD achieves the same success rate as 4.38--5.98 DNNs, which corresponds to dividing the number of flops by 2.51--3.46.  

Then, the uncertainty on parameter estimation captured by cSGLD is useful to discover generic adversarial directions. 
%The efficiency of this technique emphasizes the benefits that one can gain from the cyclical nature of cSGLD, thanks to warm restarts, fast convergence, and the local characterization of each mode of the parameters posterior.



\subsection{Inter-architecture Transferability}
\label{sec:inter-arch-transf}


We now focus on black-box settings where the architecture of the target model is unknown (and not used to build the surrogate model). We consider ten architectures (five for both ImageNet and CIFAR-10). Following \citet{Liu2017DelvingAttacks,Xie2019,Li2018LearningNetworks,Dong2018},  
we hold-out one architecture to act as the target model and use the four remaining ones as surrogates. %We train every architecture with cSGLD to obtain the same number of samples for each of them. 
We apply I-FGSM with 1 model per surrogate architecture per iteration to keep attack cost constant.  Due to computational limitations, we limit the training to 135 epochs on ImageNet (3 cycles of 45 epochs for cSGLD). For every architecture, cSGLD and 1 DNN are trained for the same number of epochs.

As shown in Tables \ref{tab:transfer-multi-archs-imagenet} (ImageNet) and \ref{tab:transfer-multi-archs} (CIFAR-10), our method significantly improved transferability on all five hold-out architectures for both datasets, except for the L$\infty$ VGG19 target (with a difference of 0.4 percentage point). On CIFAR-10, the differences range from 15.0 to 35.2 percentage points ($2$-norm), and from -0.4 to 9.9 ($\infty$-norm). Our method outperforms 4 DNNs per architecture on the L$2$ attack, despite been trained for 4 times fewer epochs. On ImageNet, cSGLD improves over the one DNN counterpart by 11.8 and 29.9 percentage points of success rate at constant computational train and attack budget. 

\cref{app-sec:inter-arch-appendix} presents the results for an alternative protocol where we use a single architecture as surrogate. In summary, in this setup cSGLD achieves a higher inter-architecture success rate in 39/40 cases on ImageNet, 38/40 cases on CIFAR-10, and in 18/18 cases on MNIST, compared to a single DNN trained for the same number of epochs. Differences range between -0.3 and 44.8 percentage points on ImageNet, -2.3 and 62.1 on CIFAR-10 and 0.2 and 83.2 on MNIST.

We conclude that our method improves transferability even when the target architecture is unknown. This tends to indicate that the adversarial directions against posterior predictive distribution are partially aligned across different architectures. In other words, given a common classification task, %there might exist a part of the uncertainty in parameters' estimate that is common across architectures, and 
the variability of an architecture parameters might be informative of the variability of another architecture parameters.


\subsection{Test-time Transferability Techniques}


\begin{table*}[t!]
\caption{Transfer success rates of (M)I-FGSM improved by our approach combined with test-time techniques on ImageNet~(in \%). Target in column. ResNet50 is intra-architecture transferability, others are inter-architecture. Bold is best. Symbols $\star$ are DNN-based techniques better than our vanilla cSGLD surrogate, $\dagger$ are techniques that degrades their vanilla surrogate. All techniques improve with cSGLD compared to 1 DNN.}
\centering
%\resizebox{1.99\columnwidth}{!}{
\begin{tabular}{lllrrrrr}
\toprule
\multicolumn{2}{c}{ } & \multicolumn{5}{c}{\bfseries Target Architecture} \\
\cmidrule(l{3pt}r{3pt}){3-7}
\bfseries Norm     & \bfseries Surrogate   &      ResNet50 &     ResNeXt50 &   DenseNet121 &       MNASNet & EffNetB0 \\
\midrule
\multirow{16}{*}{L$2$} & 1 DNN &  56.60 \tiny ±0.71 &  41.09 \tiny ±0.61 &  29.73 \tiny ±0.30 &  28.13 \tiny ±0.17 &   16.64 \tiny ±0.33 \\
     & \quad+ Input Diversity &  83.15 \tiny ±0.30 &  73.17 \tiny ±0.80 &  61.24 \tiny ±0.58 &  58.16 \tiny ±0.36 &   $\star$42.10 \tiny ±0.36 \\
     & \quad+ Skip Gradient Method &  65.64 \tiny ±0.88 &  52.75 \tiny ±0.42 &  38.58 \tiny ±0.55 &  43.40 \tiny ±0.61 &   29.11 \tiny ±0.30 \\
     & \quad+ Ghost Networks &  78.84 \tiny ±0.46 &  62.46 \tiny ±0.38 &  45.76 \tiny ±0.02 &  41.44 \tiny ±0.58 &   25.77 \tiny ±0.11 \\
     & \quad+ Momentum (MI-FGSM) &  $\dagger$52.53 \tiny ±0.80 &  $\dagger$37.15 \tiny ±0.76 &  $\dagger$26.33 \tiny ±0.48 &  $\dagger$25.21 \tiny ±0.42 &   $\dagger$14.74 \tiny ±0.31 \\
     & \quad\quad+ Input Diversity &  80.81 \tiny ±0.72 &  69.55 \tiny ±0.83 &  56.73 \tiny ±0.39 &  54.16 \tiny ±0.05 &   37.07 \tiny ±0.03 \\
     & \quad\quad+ Skip Gradient Method &  65.65 \tiny ±0.95 &  53.25 \tiny ±0.18 &  38.79 \tiny ±0.62 &  44.33 \tiny ±0.63 &   29.45 \tiny ±0.28 \\
     & \quad\quad+ Ghost Networks &  71.50 \tiny ±0.12 &  53.45 \tiny ±0.65 &  37.39 \tiny ±0.47 &  34.53 \tiny ±0.69 &   20.29 \tiny ±0.36 \\
     \cline{2-7}
     & cSGLD &  84.83 \tiny ±0.55 &  74.73 \tiny ±0.82 &  71.45 \tiny ±0.56 &  60.14 \tiny ±0.44 &   39.71 \tiny ±0.20 \\
     & \quad+ Input Diversity &  \textbf{93.87 \tiny ±0.19} &  \textbf{89.12 \tiny ±0.24} &  \textbf{88.52 \tiny ±0.16} &  \textbf{82.78 \tiny ±0.28} &   \textbf{66.13 \tiny ±0.35} \\
     & \quad+ Skip Gradient Method &  $\dagger$83.17 \tiny ±0.85 &  $\dagger$72.79 \tiny ±1.06 &  $\dagger$66.19 \tiny ±0.89 &  71.71 \tiny ±0.41 &   52.66 \tiny ±0.31 \\
     & \quad+ Ghost Networks &  92.99 \tiny ±0.13 &  85.69 \tiny ±0.24 &  82.81 \tiny ±0.42 &  72.88 \tiny ±0.30 &   50.30 \tiny ±0.29 \\
     & \quad+ Momentum (MI-FGSM) &  $\dagger$82.44 \tiny ±0.19 &  $\dagger$70.93 \tiny ±1.04 &  $\dagger$66.19 \tiny ±0.56 &  $\dagger$55.51 \tiny ±0.59 &   $\dagger$34.49 \tiny ±0.59 \\
     & \quad\quad+ Input Diversity &  93.48 \tiny ±0.23 &  87.87 \tiny ±0.15 &  86.81 \tiny ±0.33 &  80.37 \tiny ±0.20 &   60.26 \tiny ±0.02 \\
     & \quad\quad+ Skip Gradient Method &  $\dagger$82.35 \tiny ±0.10 &  $\dagger$71.54 \tiny ±0.58 &  $\dagger$64.50 \tiny ±0.18 &  70.47 \tiny ±0.22 &   50.80 \tiny ±0.23 \\
     & \quad\quad+ Ghost Networks &  90.11 \tiny ±0.18 &  80.35 \tiny ±0.61 &  75.10 \tiny ±0.67 &  64.08 \tiny ±0.12 &   39.85 \tiny ±0.52 \\
\hline
\multirow{16}{*}{L$\infty$} & 1 DNN &  47.81 \tiny ±1.09 &  32.29 \tiny ±0.64 &  23.43 \tiny ±0.32 &  22.52 \tiny ±0.45 &   12.77 \tiny ±0.32 \\
     & \quad+ Input Diversity &  76.55 \tiny ±1.01 &  62.57 \tiny ±0.56 &  50.17 \tiny ±0.33 &  49.31 \tiny ±0.18 &   $\star$32.64 \tiny ±0.09 \\
     & \quad+ Skip Gradient Method &  66.36 \tiny ±0.50 &  51.60 \tiny ±0.36 &  39.05 \tiny ±0.24 &  45.60 \tiny ±0.72 &   30.69 \tiny ±0.03 \\
     & \quad+ Ghost Networks &  67.02 \tiny ±0.17 &  46.74 \tiny ±0.63 &  32.57 \tiny ±0.17 &  31.12 \tiny ±0.77 &   17.68 \tiny ±0.05 \\
     & \quad+ Momentum (MI-FGSM) &  55.12 \tiny ±0.82 &  38.47 \tiny ±0.82 &  28.19 \tiny ±0.14 &  27.55 \tiny ±0.67 &   16.34 \tiny ±0.37 \\
     & \quad\quad+ Input Diversity &  $\star$82.47 \tiny ±0.41 &  $\star$69.69 \tiny ±0.81 &  57.79 \tiny ±0.57 &  $\star$55.99 \tiny ±0.37 &   $\star$38.63 \tiny ±0.29 \\
     & \quad\quad+ Skip Gradient Method &  68.39 \tiny ±0.53 &  54.57 \tiny ±0.60 &  41.48 \tiny ±0.37 &  47.97 \tiny ±0.41 &   $\star$33.16 \tiny ±0.37 \\
     & \quad\quad+ Ghost Networks &  71.27 \tiny ±0.54 &  51.46 \tiny ±0.84 &  36.91 \tiny ±0.48 &  34.54 \tiny ±0.32 &   20.51 \tiny ±0.30 \\
     \cline{2-7}
     & cSGLD &  78.71 \tiny ±1.19 &  65.11 \tiny ±1.45 &  61.49 \tiny ±0.59 &  51.81 \tiny ±1.45 &   31.11 \tiny ±0.99 \\
     & \quad+ Input Diversity &  90.03 \tiny ±0.10 &  82.13 \tiny ±0.45 &  81.19 \tiny ±0.34 &  74.48 \tiny ±0.39 &   53.51 \tiny ±0.39 \\
     & \quad+ Skip Gradient Method &  81.37 \tiny ±0.72 &  69.88 \tiny ±1.31 &  65.20 \tiny ±0.75 &  71.68 \tiny ±0.53 &   52.15 \tiny ±0.32 \\
     & \quad+ Ghost Networks &  87.33 \tiny ±0.73 &  76.00 \tiny ±1.33 &  71.67 \tiny ±0.97 &  61.45 \tiny ±0.25 &   37.19 \tiny ±0.68 \\
     & \quad+ Momentum (MI-FGSM) &  82.89 \tiny ±0.70 &  70.42 \tiny ±1.26 &  66.39 \tiny ±0.74 &  56.68 \tiny ±0.97 &   36.00 \tiny ±1.15 \\
     & \quad\quad+ Input Diversity &  \textbf{93.97 \tiny ±0.26} &  \textbf{87.69 \tiny ±0.44} &  \textbf{86.78 \tiny ±0.16} &  \textbf{81.08 \tiny ±0.14} &   \textbf{60.87 \tiny ±0.48} \\
     & \quad\quad+ Skip Gradient Method &  84.19 \tiny ±0.21 &  73.14 \tiny ±0.99 &  67.35 \tiny ±0.26 &  74.36 \tiny ±0.47 &   55.30 \tiny ±0.16 \\
     & \quad\quad+ Ghost Networks &  89.53 \tiny ±0.05 &  78.69 \tiny ±0.19 &  73.33 \tiny ±0.58 &  63.56 \tiny ±0.35 &   39.79 \tiny ±0.52 \\
\bottomrule
\end{tabular}
%}
\label{tab:transfer-test-time-techs-imagenet}
\end{table*}


Given that our approach works at train time, we evaluate its combination with test-time techniques. We apply three test-time transformations to cSGLD samples and one DNN obtained with the same number of epochs (300 for CIFAR-10, 135 for ImageNet). The ImageNet surrogates are ResNet50 (respect. PreResNet110 on CIFAR-10). The targets are the same as in \cref{sec:inter-arch-transf}. Following \citet{Li2018LearningNetworks,Xie2019,Wu2020SkipResNets}, we also combine every test-time technique with momentum\footnote{All rows with momentum correspond to MI-FGSM, an attack variant designed to improve transferability~\citep{Dong2018}.}.

Table \ref{tab:transfer-test-time-techs-imagenet} shows the results on ImageNet (\cref{app-sec:test-techs-appendix} for CIFAR-10). We observe that our approach and the test-time techniques complement well to each other. Indeed, the best success rates are always achieved by a technique applied on cSGLD (in bold). All three techniques combined with momentum applied on cSGLD achieve a systematically higher success rate than the same technique applied on 1 DNN, with differences ranging from 10.7 to 41.7 percentages points on ImageNet and from 3.8 to 56.2 on CIFAR-10. Overall, the addition of a technique (excluding momentum alone) to our vanilla cSGLD surrogate never decrease the success rate on CIFAR-10 and only in 10\% of the averaged cases considered on ImageNet, as indicated by the $\dagger$ symbols.

Besides, our vanilla cSGLD surrogate achieves better transferability than any of the test-time techniques applied to 1 DNN in 90\% of the cases on CIFAR-10 and 93.3\% on ImageNet, using the I-FGSM attack. Similarly, for MI-FGSM, we observe 76.7\% for the former and 90\% for the latter.  This demonstrates that despite previous efforts in providing effective test-time techniques for transferability (see \cref{sec:related}), \emph{improving the training of the surrogate -- in our case, through efficient sampling from the posterior distribution -- yields significantly higher improvements}. Hence, while training approaches have been overlooked, canonical elements that have been related to transferability, ie. skip connections~\citep{Wu2020SkipResNets}, input~\citep{Xie2019} and model diversity~\citep{Li2018LearningNetworks}, should be put into perspective compared to the importance that the posterior distribution appears to have.



\subsection{Bayesian and Ensemble techniques}

In addition to cSGLD and Deep Ensemble, we explore the use of other training techniques to improve transferability: two other Bayesian techniques -- Variational Inference (VI) and Stochastic Weight Averaging-Gaussian (SWAG) -- and two other ensembling techniques -- Snapshot ensembles (SSE) and Fast Geometric Ensembling (FGE).  %Following the work of \citet{Ashukha2020PitfallsLearning}, we consider the effectiveness of several Bayesian training techniques (cSGLD, VI and SWAG) and Ensemble ones (Deep Ensemble, SSE and FGE). 
We train each for an equivalent computational cost of 3 DNNs on CIFAR-10 and 2 DNNs on ImageNet (except for VI and SWAG, see discussion in \cref{app-sec:bayesian-ensemble-techniques-appendix}).
Figure \ref{fig:ensemble-technique-Linf} presents the success rate of L$\infty$ I-FS(S)M attack with the corresponding training computational cost (in flops), as we increase the number of models in each surrogate. \cref{app-sec:bayesian-ensemble-techniques-appendix} contains details on the methods and the results for L$2$ I-FS(S)M. 

On CIFAR-10, the success rate of the first 4 cycles of cSGLD increases substantially from one cycle to the next (from 76.58\% to 81.56\% for the first to the second cycle) and within a single cycle (from 81.56\% to 87.20\% between the start and the end of the second cycle). This reveals that exploring modes of the posterior plays an important role to generate transferable adversarial examples, and that there is some local geometric discrepancy of the loss landscape among local maxima. On ImageNet, transferability improves mainly by sampling from several local optima.

Interestingly, even though FGE and SWAG build an ensemble around a single local optimum, their flexibility allows capturing general adversarial directions. The FGE surrogates trained for more than 0.30 petaflops have systematically higher success rates than cSGLD and SSE on CIFAR-10. 
However, the opposite is observed on ImageNet: FGE is not competitive with methods exploring several local optima (cSGLD, SSE, and Deep Ensemble). We hypothesize that modes are not as well-connected on larger datasets.

The efficiency of SWAG on both datasets opens new directions to create hybrid attacks based on few additional iterations over the training set. SWAG approximates the posterior with a Gaussian fitted on some additional SGD epochs from a pretrained DNN. It captures well the shape of the true posterior \citep{Maddox2019ALearning}, reinforcing our views on the strong relationship between the posterior and transferability. The success rate gap between cSGLD/SSE and SWAG on ImageNet suggests higher geometrical discrepancies between local loss maxima on larger datasets.

VI fails to compete with Deep Ensemble on both success rate and computational efficiency for the L$\infty$ attack on CIFAR-10, but beats it on L$2$ bound and on ImageNet.

On CIFAR-10, the marginal impact beyond 6 cSGLD cycles, 17 SWAG samples, 7 SSE models, and 35 FGE models becomes noisy. We hypothesize that correlated samples produce these limitations. Hence, the use of multiple runs is a promising direction for greater transferability.


\begin{figure}[!ht]
    \centering
        \subfloat[ImageNet]{\includegraphics[width=0.4\textwidth]{figure/RQ_techniques_imagenet_transfer_vs_time_Linf.pdf}}
        \qquad
        \subfloat[CIFAR-10]{\includegraphics[width=0.4\textwidth]{figure/RQ_techniques_transfer_vs_time_Linf.pdf}}
    \caption{Intra-architecture L$\infty$ I-FGSM success rate with respect to the training computational complexity of an increasing number of samples from six training techniques.}
    \label{fig:ensemble-technique-Linf}
\end{figure}


\subsection{Threats to validity}

External validity threats arise from the generalization outside the context of the study. First, our results may not generalize to non $p$-norm constrained adversarial examples. However, this way of ensuring imperceptibility is common to all the related work we know on transferability. We also systematically evaluate L$2$ and L$\infty$ attacks, while most previous studies do not. Second, similar to all competitive approaches, we consider benchmark datasets of classification in computer vision. The generalization of our conclusions to other domains and tasks could require a dedicated study. Finally, we fed adversarial examples directly to the target model. Evaluating adversarial examples through the physical domain may degrade success rates significantly.

Internal validity threats come from the design of the study. Our approach relies on the empirical fact that SGD is approximately a Bayesian sampler~\citep{Mingard2020IsAlmost}. A definitive proof would strengthen the premise of our paper. Moreover, despite our best effort to control confounding factors, some may exist, such as training hyperparameters.

Threat to construct validity is a consequence of metrics not suitable for evaluation. Our T-DEE metric might not be reliable when the success rate is not increasing with the number of independently trained DNNs. None of our experiences exhibit this (except L$2$ FSGM on MNIST, see supplementary materials).


%---------------------------------- conclusion

\section{Conclusion and Future Work}

We are the first to extensively investigate training-time approaches to enhance transferability. We discover a strong connection between the posterior predictive distribution and both intra- and inter-architecture transferability. Our Bayesian surrogate is efficient and effective to craft adversarial examples transferable to deterministic DNNs. Our approach further improves existing adversarial attacks and test-time transferability techniques, as one can use it on top of them to perform approximate Bayesian model averaging efficiently and with minimal modifications. We show that our simple training-time approach improves transferability more than previous test-time techniques. We, therefore, cast an important yet overlooked direction to explain transferability and pave the way for new hybrid attacks. Overall, we provide new evidence that the Bayesian framework is a promising direction for research on adversarial examples.

Our studied threat model relates mostly to uncertainty in parameter estimation. A promising venue is to explore how other settings change the types of uncertainty. The ignorance of the training dataset would increase the aleatoric uncertainty. Adding a defence such as random input transformation \citep{Xie2018MitigatingRandomization} would increase the epistemic uncertainty if its presence is unknown, and the aleatoric uncertainty through its randomness.

Another interesting direction for future work is the transferability to adversarially trained targets. If weight distributions of regular and adversarial training are orthogonal, the latter might be an effective countermeasure to our method. %Unless cSGLD can be adapted to adversarial training.


\begin{acknowledgements} 
This work is supported by the Luxembourg National Research Funds (FNR) through CORE project C18/IS/12669767/STELLAR/LeTraon.
\end{acknowledgements}

% add citatoin from supp. materials
\nocite{ILSVRC15,Krizhevsky2009LearningImages,Xie2017,Tan2018,Tan2019,He2016IdentityNetworks,Simonyan2015VeryRecognition,Zagoruyko2016WideNetworks,NEURIPS2019_9015,Croce2020ReliableAttacks,art2018,Huang2017SnapshotFree,Garipov2018LossDNNs}

\bibliography{references}




% To be included in a separate file
%\clearpage

%\newpage
%\appendix
%\onecolumn


%\input{7-supplementary}




% \section{Introduction}\label{sec:intro}
% UAI 2022 papers have to be prepared using \LaTeX.
% To start writing your paper, copy \texttt{uai2022-template.tex} and replace title, authorship, and content with your own.

% The UAI 2022 paper style is based on a custom \textsf{uai2022} class.
% The class file sets the page geometry and visual style.\footnote{%
%     The class uses the packages \textsf{adjustbox}, \textsf{environ}, \textsf{letltxmacro}, \textsf{geometry}, \textsf{footmisc}, \textsf{caption}, \textsf{textcase}, \textsf{titlesec}, \textsf{titling}, \textsf{authblk}, \textsf{enumitem}, \textsf{microtype}, \textsf{lastpage}, and \textsf{kvoptions}.
% }
% The class file also loads basic text fonts.\footnote{%
%     Fonts loaded are \textsf{times} (roman), \textsf{helvet} (sanserif), \textsf{courier} (fixed-width), and \textsf{textcomp} (common symbols).
% }
% \emph{You may not modify the geometry or style in any way, for example, to squeeze out a little bit of extra space.}
% (Also do not use \verb|\vspace| for this.)
% Feel free to use convenience functionality of loaded packages such as \textsf{enumitem}.
% The class enables hyperlinking by loading the \textsf{hyperref} package.

% You are free to load any packages available in \TeX{Live}~2020 that are compatible with the UAI class.\footnote{In case this template or your submission does not compile, always first make sure your \TeX\ installation is up-to-date.}
% (Mik\TeX{} and Mac\TeX{} generally contain the same packages.)
% Do not load conflicting packages—you will get an error message—, as this complicates creating the proceedings.
% Please avoid using obsolete commands, such as \verb|\rm|, and obsolete packages, such as \textsf{epsfig}.\footnote{%
%     See \url{https://ctan.org/pkg/l2tabu}.
% }

% \swap[ ]{in the header of your source file.}{Feel free to include your own macros}

% \section{General Formatting Instructions}
% As a general rule: \emph{follow the template}.

% \subsection{Authorship}
% Reviewing is double-blind.
% However, you can already fill in your author names and affiliations in the \verb|\author| block in the preamble following the example of the template because the class will remove it as long as the option \textsf{accepted} is not passed to the class.
% Nevertheless, make sure any other information in the paper does not disclose your identity, for example URLs to supplementary material.

% \subsection{Sectioning}
% Three numbered sectioning commands are provided: \verb|\section|, \verb|\subsection|, and \verb|\subsubsection|.
% Please respect their order, so do not put a \verb|\subsubsection| directly beneath a \verb|\section|.
% One unnumbered sectioning command is provided, \verb|\paragraph|.
% It can be used directly below any numbered section level.
% Do not use any other sectioning commands.

% \subsubsection{Typing the Section Titles}
% The \verb|\section| and \verb|\subsection| titles are uppercased by the class.
% Please type them in title case.
% (This is used in the PDF bookmarks.)
% Please also write the \verb|\subsubsection| titles in title case.

% \paragraph{What is title case?}
% \href{https://en.wikipedia.org/wiki/Title_case}{Wikipedia} explains:
% \begin{quote}
%     Title case or headline case is a style of capitalization used for rendering the titles of published works or works of art in English.
%     When using title case, all words are capitalized except for ‘minor’ words (typically articles, short prepositions, and some conjunctions) unless they are the first or last word of the title.
% \end{quote}

% \subsection{References, Citations, Footnotes}\label{sec:etc}
% \subsubsection{Cross-Referencing}
% Always use \verb|\label| and \verb|\ref|—or a command with a similar effect—when cross-referencing.
% For example, this subsection is Section~\ref{sec:etc}.

% \subsubsection{Citations}
% Citations should include the author's last name and year.
% They should be part of the sentence.
% An example parenthetical citation: “Good introductions to the topic are available \citep{latexcompanion}.”
% An example textual citation: “\citet{einstein} discusses electrodynamics of moving bodies.”
% Do not use a parenthetical citation where a textual one is appropriate.
% An example of what \emph{not} to do: “\citep{einstein} discusses electrodynamics of moving bodies.”

% We strongly advise to use reference list software such as Bib\TeX{} and a citation package such as \textsf{natbib}.
% The reference style you use should be compatible with the author-year citations.
% Both the citation style and reference style used should be consistent.

% For the original submission, take care not to reveal the authors' identity through the manner in which one's own previous work is cited.
% For example, writing
% “I discussed electrodynamics of moving bodies before \citep{einstein}.” would be inappropriate, as it reveals the author's identity.
% Instead, write “\citet{einstein} discussed electrodynamics of moving bodies.”

% \subsubsection{Footnotes}
% You can include footnotes in your text.\footnote{
%     Use footnotes sparingly, as they can be distracting, having readers skip back and forth between the main text and the foot of the page.
% }
% The footnote mark should follow the fragment to which it refers, so a footnote\footnote{
%     A footnote is material put at the foot of a page.
% }
% for a word has a footnote mark attached to that word and a footnote for a phrase or sentence has a footnote mark attached to the closing punctuation.

% \section{Math}\label{sec:math}
% The class file does not load any math support package like \textsf{amsmath}\footnote{%
%   See the \textsf{amsmath} documentation at \url{https://ctan.org/pkg/amsmath} for further details.
% }.
% We advise using the \textsf{mathtools}\footnote{%
%   See the \textsf{mathtools} documentation at \url{https://ctan.org/pkg/mathtools} for further details.
% }
% package, which extends \textsf{amsmath} with fixes and even more useful commands.
% Feel free to load other support packages for symbols, theorems, etc.

% Use the \textsf{amsmath} environments for displayed equations.
% So, specifically, use the \texttt{equation} environment instead of \verb|$$...$$| and the \texttt{align} environment instead of \texttt{eqnarray}.\footnote{For reasons why you should not use the obsolete \texttt{eqnarray} environment, see Lars Madsen, \textit{Avoid eqnarray!} TUGboat 33(1):21--25, 2012.}
% An \texttt{equation}:
% \begin{equation}\label{eq:example}
%   0 = 1 - 1.
% \end{equation}
% Two \texttt{align}'ed equations:
% \begin{align*} % no numbers with starred version
%   1 + 2 &= 3,\\
%   1 - 2 &= -1.
% \end{align*}
% Equations can also be put inline, of course.
% For example, Equation~\eqref{eq:example}: \(0=1+1\). % $0=1+1$ also works
% (Notice that both inline and displayed math are part of the sentence, so punctuation should be added to displayed math.)

% The \textsf{amsmath} and \textsf{mathtools} packages provide a lot of nice functionality, such as many common math operators, e.g., \(\sin\) and \(\max\), and also commands for defining new ones.

% \section{Floats}\label{sec:floats}
% Floats, such as figures, tables and algorithms, are moving objects and are supposed to float to the nearest convenient location.
% Please do not force them to go in the middle of a paragraph.
% They must respect the column width.

% Two-column floats are possible.
% They appear at the top of the next page, so strategic placement may be necessary.
% For an example, see Figure~\ref{fig:tikz}.
% They may not enter the margins.
% \begin{figure*}
%     \centering
%     \begin{tikzpicture}[xscale=1.5]
%         \coordinate (origin);
%         \draw[->] (origin) -- +(1cm,0) node[below] {$x$};
%         \draw[->] (origin) -- +(0,1cm) node[left] {$y$};
%         \fill[gray] (45:1cm) circle[radius=.2cm];
%     \end{tikzpicture}
%     \caption{A Nice Filled Ellipse with a Pair of Coordinate Axes.}\label{fig:tikz}
% \end{figure*}

% All material in floats should be legible and of good quality.
% So avoid very small or large text and pixelated or fuzzy lines.

% \subsection{Figures}\label{sec:figures}
% Figures should go in the \texttt{figure} environment and be centered therein.
% The caption should go below the figure.
% Use \verb|\includegraphics| for external graphics files but omit the file extension.
% Supported formats are \textsf{pdf} (preferred for vector drawings and diagrams), \textsf{png} (preferred for screenshots), and \textsf{jpeg} (preferred for photographs).
% Do not use \verb|\epsfig| or \verb|\psfig|.
% If you want to scale the image, it is better to use a fraction of the line width rather than an explicit length.
% For example, see Figure~\ref{fig:Eindhoven}.
% \begin{figure}
%   \centering
%   \includegraphics[width=0.7\linewidth,page=3]{Eindhoven}
%   \caption{A View of a Nice City.}\label{fig:Eindhoven}
% \end{figure}

% Do not use \verb|\graphicspath|.
% If the images are contained in a subdirectory, specify this when you include the image, for example \verb|\includegraphics{figures/mypic}|.

% \subsection{Tables}\label{sec:tables}
% Tables should go in the \texttt{table} environment and be centered therein.
% The caption should go above the table and be in title caps.
% For an example, see Table~\ref{tab:data}.
% \begin{table}
%     \centering
%     \caption{An Interesting Table.}\label{tab:data}
%     \begin{tabular}{rl}
%       \toprule % from booktabs package
%       \bfseries Dataset & \bfseries Result\\
%       \midrule % from booktabs package
%       Data1 & 0.12345\\
%       Data2 & 0.67890\\
%       Data3 & 0.54321\\
%       Data4 & 0.09876\\
%       \bottomrule % from booktabs package
%     \end{tabular}
% \end{table}

% \subsection{Algorithms}\label{sec:algorithms}
% You can load your favorite algorithm package, such as \textsf{algorithm2e}\footnote{See the \textsf{algorithm2e} documentation at \url{https://ctan.org/pkg/algorithm2e}.}.
% Use the environment defined in the package to create a centered float with an algorithm inside.

% \section{Back Matter}
% There are a some final, special sections that come at the back of the paper, in the following order:
% \begin{itemize}
%   \item Author Contributions
%   \item Acknowledgements
%   \item References
% \end{itemize}
% They all use an unnumbered \verb|\subsubsection|.

% For the first two special environments are provided.
% (These sections are automatically removed for the anonymous submission version of your paper.)
% The third is the ‘References’ section.
% (See below.)

% (This ‘Back Matter’ section itself should not be included in your paper.)

% \begin{contributions} % will be removed in pdf for initial submission,
%                       % so you can already fill it to test with the
%                       % ‘accepted’ class option
%     Briefly list author contributions.
%     This is a nice way of making clear who did what and to give proper credit.

%     H.~Q.~Bovik conceived the idea and wrote the paper.
%     Coauthor One created the code.
%     Coauthor Two created the figures.
% \end{contributions}

% \begin{acknowledgements} % will be removed in pdf for initial submission,
%                          % so you can already fill it to test with the
%                          % ‘accepted’ class option
%     Briefly acknowledge people and organizations here.

%     \emph{All} acknowledgements go in this section.
% \end{acknowledgements}

% \bibliography{uai2022-template}

% \appendix
% % NOTE: necessary when ptmx or no mathfont class option is given
% \providecommand{\upGamma}{\Gamma}
% \providecommand{\uppi}{\pi}
% \section{Math font exposition}
% How math looks in equations is important:
% \begin{equation*}
%   F_{\alpha,\beta}^\eta(z) = \upGamma(\tfrac{3}{2}) \prod_{\ell=1}^\infty\eta \frac{z^\ell}{\ell} + \frac{1}{2\uppi}\int_{-\infty}^z\alpha \sum_{k=1}^\infty x^{\beta k}\mathrm{d}x.
% \end{equation*}
% However, one should not ignore how well math mixes with text:
% The frobble function \(f\) transforms zabbies \(z\) into yannies \(y\).
% It is a polynomial \(f(z)=\alpha z + \beta z^2\), where \(-n<\alpha<\beta/n\leq\gamma\), with \(\gamma\) a positive real number.

\end{document}
