%\documentclass{uai2023} % for initial submission
\documentclass[accepted]{uai2023} % after acceptance, for a revised
                                    % version; also before submission to
                                    % see how the non-anonymous paper
                                    % would look like
%% There is a class option to choose the math font
% \documentclass[mathfont=ptmx]{uai2023} % ptmx math instead of Computer
                                         % Modern (has noticable issues)
% \documentclass[mathfont=newtx]{uai2023} % newtx fonts (improves upon
                                          % ptmx; less tested, no support)
% NOTE: Only keep *one* line above as appropriate, as it will be replaced
%       automatically for papers to be published. Do not make any other
%       change above this note for an accepted version.

%% Choose your variant of English; be consistent
\usepackage[american]{babel}
% \usepackage[british]{babel}

\usepackage{subfig}
\usepackage{hyperref}
\usepackage{url}
\usepackage{multirow, booktabs}
\usepackage{makecell}
\usepackage{graphicx}
\usepackage{color}
\usepackage{amsfonts} 
\usepackage{tikz}
\usepackage{amsthm}
\newtheorem{proposition}{Proposition}
\newtheorem{theorem}{Theorem}
\newtheorem{definition}{Definition}
\usepackage{xspace}
\newcommand{\myparagraph}[1]{\vspace{1ex}\noindent{\bf #1}}
\def\name{\textsf{Mnemonist}\xspace}
\def\scale{\textit{scale}\xspace}
%\def\name{\textsf{TRADR}\xspace}
\usepackage[noend]{algorithmic}
\usepackage[algoruled,nofillcomment,algo2e]{algorithm2e}

\def\attack{\textsf{Mnemonist-RecoNN}\xspace}
\newcommand{\lo}{\mathcal{L}} %Loss Function
\newcommand{\W}{\mathbf{W}} %Weights of a layer
\newcommand{\w}{\mathbf{w}} %Row weights of a layer
\newcommand{\bi}{\mathbf{b}} %Bias of a layer
\newcommand{\x}{\mathbf{x}} %A data point
\newcommand{\h}{\mathbf{h}} %pre-activation
\newcommand{\ac}{\mathbf{a}} %activation
\newcommand{\init}{\mathbf{W}_{\text{init}}} % initial
\newcommand{\rec}{{\text{RecoNN}}\xspace} % initial model parametersmodel parameters
\newcommand{\good}{{``good''}\xspace}
\newcommand{\bad}{{``bad''}\xspace}


%% Some suggested packages, as needed:
\usepackage{natbib} % has a nice set of citation styles and commands
    \bibliographystyle{plainnat}
    \renewcommand{\bibsection}{\subsubsection*{References}}
\usepackage{mathtools} % amsmath with fixes and additions
% \usepackage{siunitx} % for proper typesetting of numbers and units
\usepackage{booktabs} % commands to create good-looking tables
\usepackage{tikz} % nice language for creating drawings and diagrams

\newcommand{\swap}[3][-]{#3#1#2} % just an example

\title{\name: Locating Model Parameters that Memorize Training Examples}

\author[1]{\href{mailto:<a.shahinshamsabadi@turing.ac.uk>}{Ali Shahin Shamsabadi}}
\author[2]{Jamie Hayes}
\author[2]{Borja Balle}
\author[1,3]{Adrian Weller}

\affil[1]{%
    The Alan Turing Institute
}
\affil[2]{%
    Google DeepMind
}
\affil[3]{%
    University of Cambridge
  }
  
  \begin{document}
\maketitle


\begin{abstract}


Recent work has shown that an adversary can reconstruct training examples given access to the parameters of a deep learning image classification model.
We show that the quality of reconstruction depends heavily on the type of activation functions used. In particular, we show that ReLU activations lead to much lower quality reconstructions compared to smooth activation functions. We explore if this phenomenon is a fundamental property of models with ReLU activations, or if it is a weakness of current attack strategies.
We first study the training dynamics of small MLPs with ReLU activations and identify redundant model parameters that do not memorise training examples.
Building on this, we propose our \name method, which is able to detect redundant model parameters, and then guide current attacks to focus on informative parameters to improve the quality of reconstructions of training examples from ReLU models. 

\end{abstract}

\section{Introduction}
\label{sec:introduction}

Machine Learning (ML) models have the capacity to memorize examples from training data~\citep{zhang2017understanding}. 
Consequently, releasing ML models can be a risk to privacy if the training data is sensitive. 
In the most serious example of a privacy breach, verbatim examples of training data points can be reconstructed~\citep{haim2022reconstructing,informed_adversary,guo2022bounding,fowl2021robbing}.
For example, it has been shown that an informed adversary with knowledge of all the data points in a training set except one (the target point) can reconstruct the target point if they have access to the model parameters~\citep{informed_adversary}. 
To do this, the informed adversary trains a Reconstructor Neural Network (\rec) that receives \emph{all} parameters from the model as the input and reconstructs the target point as the output.   

We show that this adversary will not successfully reconstruct training examples with high probability if the model is trained with ReLU~\citep{glorot2011deep} activation functions. 
This is a significant weakness of the attack since ReLU activations are very common. Their popularity is in part because they often yield superior performance over models trained with other activation functions, and also due to their faster convergence rates~\citep{krizhevsky2017imagenet,glorot2011deep}. 
ReLU activations also negatively affect the success of attacks in other settings such as federated learning~\citep{wei2020framework} where the attack relies on access to intermediate model updates. \citet{haim2022reconstructing} replaced ReLU activations with Sigmoid activations to run their training data reconstruction attack because ReLU ``contains flat regions which are hard to optimize''. In order to understand the extent to which models using ReLU activations are vulnerable to reconstruction attacks, we ask the following questions:
\begin{center}
    \emph{Q1: Why do ReLU activations lead to much lower quality reconstructions compared to smooth activations?}\\
    \emph{Q2: How can we improve the quality of the reconstruction of target points from models with ReLU activations?}
\end{center}

Our approach to answering these questions is to learn how important each parameter of the model is to the success of the reconstruction of the target point.
We then design new attack methods that shift the focus of the attack towards the parameters that are identified as important. 

First, we analytically demonstrate that \emph{not all} parameters in ReLU activated models store information about the target point. Intuitively, this is because ReLU deactivates neurons with negative outputs in the forward pass -- later in the backward pass these non-activated neurons prevent their incoming parameters from being updated by making the gradient of the loss with respect to the input zero, thus no information about the input is stored in incoming parameters to non-activated neurons. This is not the case for models with smooth activations (including Sigmoid) as their derivatives are always nonzero. We empirically demonstrate this behaviour by studying the training dynamic of the models, namely the per-example gradient and number of training examples stored in each parameter across all shadow models.  


Second, we design an approach we call \name, which distinguishes between parameters that contain no information about the target point and parameters that contain a lot of information about the target point.
\name is black-box in the sense that it does not need access to intermediate model updates throughout training, and is instantiated by extending the approach of Local Interpretable Model-Agnostic Explanations (LIME)~\citep{ribeiro2016should} to operate in the parameter space. \name starts by semantically grouping parameters into \emph{superparameters}. These superparameters each represent all incoming parameters to each individual neuron. Then, \name learns the importance of each superparameter for data reconstruction as coefficients of an interpretable model which is trained on the response of \rec to the absence or presence of each superparameter. 


Finally, based on \name, we introduce an attack, called \attack, that can improve the quality of reconstructions of target points from models trained with ReLU activations by training \rec on \emph{only} informative parameters. We highlight the following contributions:
\begin{itemize}[leftmargin=*]
    \item We characterise the fundamental property of the existence of redundant parameters in models with ReLU activations through both theoretically and empirically analysing their training dynamics. 
    \item We propose a black-box explanation technique, \name, which identifies parameters that are likely to store information about the target point we aim to reconstruct. We provide theoretical and empirical justifications for the performance of \name.
    \item We show that naively applying the attack proposed by  \citet{informed_adversary} on models with ReLU activations results in much lower quality reconstruction than smooth activations. We improve the quality of reconstruction by applying the attack to only a subset of parameters that are identified by \name as informative. \end{itemize}




%


\section{Threat Model and Setup}
\label{sec:background}



\begin{table}[t!]\caption{Notation.}
\begin{center}
\setlength\tabcolsep{1pt}
\small
\begin{tabular}{llll}
\toprule
 & Meaning &  & Meaning\\\midrule
$\init$ & Initial model param.&
$\W$  & Released model param.\\
$\W_i$ & Shadow model param. & $\bar{z}$ & A public target example \\
$\mathbf{w}^{(s)}$ & Model superparam. & $\W'$ & Perturbed model \\
$M$ & \# perturbed models & $\mathbf{b}$ & binary mask \\
$D_{-}$ & Fixed dataset &
$N-1$ & Size of fixed set \\
$K$ & \# shadow models & $\rec$ & Reconstructed network \\
$T$ & \# training steps & $\mathcal{S}$ & Adversary side knowledge\\
$\mathcal{L}_{Rec}$ & Reconstruction loss & $\boldsymbol\phi$ & \rec parameters \\
\bottomrule
\end{tabular}
\end{center}
\label{tab:TableOfNotationForMyResearch}
\end{table}


Recent work on training data reconstruction attacks has  focused on attacking federated learning set-ups where an adversary has access to all intermediate model updates~\citep{boenisch2021curious,wen2022fishing,fowl2021robbing}.
Similar to the threat model considered by \citet{informed_adversary}, we assume the adversary does not have access to intermediate model updates; the adversary can only observe the initial and final model parameters.
This restriction on the adversary means that our work is applicable to approaches beyond federated learning.

More formally, we assume a model developer trains an ML model on a supervised learning task, which we refer to as the \emph{released} model. 
They use an off-the-shelf optimization algorithm such as SGD with momentum to transform a set of initial model parameters, $\init$, to a set of final model parameters $\W$, by training for $T$ steps on a dataset {${D_{-}\cup\{z\}}$}, where $D_{-}$ is referred to as the \emph{fixed dataset}, $z$ is the target point, {${|D_{-}|=N-1}$} and both $z$ and all points in $D_{-}$ are sampled from an input space $\mathcal{Z}$.
Table~\ref{tab:TableOfNotationForMyResearch} describes all necessary notation used throughout this paper.

Following the terminology used by \citet{informed_adversary}, we assume the adversary is \emph{informed}.
That is, the adversary has knowledge of the tuple ($D_{-}$, $\init$, $\W$, $\mathcal{S}$), where $\mathcal{S}$ represents the side-knowledge available to the adversary.
We assume $\mathcal{S}$ includes all training hyperparameters such as $T$, initial model parameters $\init$, the size of mini-batch, the optimizer and the learning rate of the optimizer.
However, we do not assume the adversary knows the randomness used to sample mini-batches from $D_{-}\cup\{z\}$ at each step, nor do they have access to intermediate model parameters. The goal of the adversary is to reconstruct the target point $z$ given ($D_{-}$, $\init$, $\W$, $\mathcal{S}$). 
To do this, we run the attack proposed by \citet{informed_adversary}. This attack is designed based on the intuition that the impact of the private target point $z$ on the released model $\W$ trained on $D_{-}\cup\{z\}$ is similar to the impact of a public target point $\bar{z}$ on a shadow model $\bar{\W}$ trained on $D_{-}\cup\{\bar{z}\}$.
In particular, the attack consists of the following three stages~\citep{informed_adversary}:


\begin{enumerate}
    \item \textbf{Training shadow models to collect information about the impact of training examples on model parameters.} We assume the adversary has access to a public dataset $\bar{Z}=\{\bar{z}_i\}_{i=1}^K$ containing $K$ data points disjoint from $D_{-}\cup\{z\}$. We train $K$ shadow models, $\{\bar{\W}_i\}_{i=1}^K$, where each shadow model $\bar{\W}_i$ is trained on the fixed dataset plus the $i$-th public data point, $D_{-}\cup\{\bar{z}_i\}$ using side-knowledge $\mathcal{S}$ (including the same initial parameters $\init$ and optimizer as the ones used for the released model).
    \item \textbf{Training a Reconstructor Neural Network to output training examples from model parameters.} We train a Reconstructor Neural Network, $\rec(\cdot)$, whose inputs lie in the parameter space of shadow models and outputs lie in the input space of the shadow models. In particular, the \rec receives $\W_i$ as an input and tries to reconstruct its corresponding target point $\bar{z}_i$ as an output by minimizing the Mean Squared Error (MSE) and Mean Absolute Error (MAE) between the target $\bar{z}_i$ and its reconstruction $\rec(\W_i)$, as {$\mathcal{L}_{Rec}={\text{MSE}\big(\rec(\W_i),\bar{z}_i\big)+\text{MAE}\big(\rec(\W_i),\bar{z}_i\big)}$}.
    \item \textbf{Producing a candidate reconstruction for the target point.} We obtain a reconstruction candidate for the target point $z$ by inputting the released model parameters $\W$ to the trained \rec.
\end{enumerate}


While the assumption that such an informed adversary exists is perhaps unrealistic for practical attacks, the attacks we study in this work are designed to reveal the maximum amount of privacy leakage that could be revealed to such an adversary. 
As such, our work is similar in spirit to the long list of research on auditing the privacy of machine learning models~\citep{lu2022general, nasr2021adversary, jagielski2020auditing, zanella2022bayesian}, and should be viewed as complementary to research on reconstructing training data from federated learning systems, which leans more on the practical side.
Finally, we note that recent work by \citet{haim2022reconstructing} also investigates reconstructing training data without adversarial access to intermediate model updates.
However, the focus of their work is untargeted in that they try and reconstruct \emph{any} training data points, while our work aims to reconstruct a specific training example.
Furthermore, their reconstruction attacks are confined to simple linear models, while our attacks operate on fully-connected neural networks; \citet{informed_adversary} have already shown that closed form solutions for reconstruction attacks with informed adversaries exist on convex models.


\myparagraph{Setup.} We focus on fully-connected neural networks (FCNN) and CIFAR10 following the baseline approach of~\citet{informed_adversary}\footnote{See Section~\ref{sec:discussion} for a discussion on the choice of the dataset and model architectures.}.  
Our experimental setup is summarised in Table~\ref{tab:setup} and unless stated otherwise, all experimental results are reported by averaging across 1,000 reconstructions where the targets are selected from the CIFAR10 test set.
Regarding performance measures, we quantify the quality of reconstruction by computing the MAE+MSE between the target point and its reconstructed point. 







\section{Reconstructing Training Data from Models with ReLU Activations  Is Hard}
\label{sec:motivation}

\begin{table}[t]\caption{Experimental setup.}
    \centering
    \small
    \setlength\tabcolsep{1pt}
    \begin{tabular}{|ccc|}
    \Xhline{3\arrayrulewidth} 
  \multicolumn{3}{|c|}{Released/Shadow models}  \\
          Architecture & Optimizer & \#steps   \\
       FCNN (layer:4, width:10) & SGD+Momentum (Full-Batch) & $T$=$40$\\
       \hline
    \end{tabular}
    \centering
        \begin{tabular}{|cc|ccc|}
        \hline
  \multicolumn{2}{|c|}{CIFAR10 dataset} & \multicolumn{3}{c|}{\rec} \\
         Fixed size & Shadow size  & Architecture & Loss & \#steps \\
       $N-1$=$10k$ & $K$=$40k$ & Transposed CNN & MAE+MSE & 200 \\
      \Xhline{3\arrayrulewidth} 
    \end{tabular}
    \label{tab:setup}
\end{table}
We examine issues associated with training data reconstructions mounted against models with ReLU activations. 
In particular, we compare the effect of changing the model activation functions from Sigmoid\footnote{Other smooth activation functions give similar results to Sigmoid in terms of data reconstruction (Table V in ~\citet{informed_adversary}).} to ReLU
on the quality of reconstructing target points from model parameters.
\begin{figure}
    \centering
    \begin{tabular}{c}
      \includegraphics[width=0.3\textwidth]{figures/relu_vs_sigmoid/Sensitivity_ReluvsSig.pdf}
        \end{tabular}
    \caption{Reconstruction loss of target points in ReLU versus Sigmoid activated models across 20 different initializations. ReLU activations lead to higher reconstruction loss and higher variance compared to Sigmoid activations.}
    \label{fig:SensitivityGap}
\end{figure}

Figure~\ref{fig:SensitivityGap} shows the impact of the activation function in the released model on the loss of reconstructing target points from the final released model parameters across initializations ($\init$). The results demonstrate that the reconstruction of target points from ReLU activated released models is, in general, harder than from Sigmoid activated released models. 
Examples of reconstructions are visualised in Figure~\ref{fig:visualization} to help the interested reader calibrate how numeric reconstruction losses map to the visual quality of reconstructions; in general, one can confidently pair the reconstruction with the target if it has a reconstruction loss smaller than 0.15.
The quality of target points reconstructed from Sigmoid activated models are better than the quality of target points reconstructed from ReLU activated models. Figure~\ref{fig:SensitivityGap} also shows that the variation of the gap between ReLU and Sigmoid due to the randomness of the parameter initialization is large. 
The reconstruction loss of models with ReLU activations ranges from 0.05 to 0.29, while Sigmoid activations lead to a low magnitude and narrow range of around 0.03. 




\begin{figure}[t]
    \begin{tabular}{ccc}
    \scriptsize Target & \scriptsize ReLU &  \scriptsize Sigmoid \\
       \includegraphics[width=0.12\textwidth]{figures/relu_vs_sigmoid/imgs/22Test_Target_ShadowInitSeed301_ShadowEpochs40_ShadowDataSeed42_NumShadows40000.pdf}
       &
       \includegraphics[width=0.12\textwidth]{figures/relu_vs_sigmoid/imgs/22Test_ActRelu_ShadowInitSeed301_ShadowEpochs40_ShadowDataSeed42_NumShadows40000.pdf}
       &
       \includegraphics[width=0.12\textwidth]{figures/relu_vs_sigmoid/imgs/22Test_ActSigmoid_ShadowInitSeed301_ShadowEpochs40_ShadowDataSeed42_NumShadows40000.pdf}\\
       & \scriptsize $\mathcal{L}_{Rec}=.3072$ &  \scriptsize $\mathcal{L}_{Rec}=.0189$ \\
      \includegraphics[width=0.12\textwidth]{figures/relu_vs_sigmoid/imgs/0Test_Target_ShadowInitSeed301_ShadowEpochs40_ShadowDataSeed42_NumShadows40000.pdf}
       &
       \includegraphics[width=0.12\textwidth]{figures/relu_vs_sigmoid/imgs/0Test_ActRelu_ShadowInitSeed301_ShadowEpochs40_ShadowDataSeed42_NumShadows40000.pdf}
       &
       \includegraphics[width=0.12\textwidth]{figures/relu_vs_sigmoid/imgs/0Test_ActSigmoid_ShadowInitSeed301_ShadowEpochs40_ShadowDataSeed42_NumShadows40000.pdf}\\
       & \scriptsize $\mathcal{L}_{Rec}=.2996$ &  \scriptsize $\mathcal{L}_{Rec}=.0289$ \\
        \includegraphics[width=0.12\textwidth]{figures/relu_vs_sigmoid/imgs/8Test_Target_ShadowInitSeed502_ShadowEpochs40_ShadowDataSeed42_NumShadows40000.pdf}
       &
       \includegraphics[width=0.12\textwidth]{figures/relu_vs_sigmoid/imgs/8Test_ActRelu_ShadowInitSeed502_ShadowEpochs40_ShadowDataSeed42_NumShadows40000.pdf}
       &
       \includegraphics[width=0.12\textwidth]{figures/relu_vs_sigmoid/imgs/8Test_ActSigmoid_ShadowInitSeed502_ShadowEpochs40_ShadowDataSeed42_NumShadows40000.pdf}\\
       & \scriptsize $\mathcal{L}_{Rec}=.1161$ &  \scriptsize $\mathcal{L}_{Rec}=.0292$ \\
     \includegraphics[width=0.12\textwidth]{figures/relu_vs_sigmoid/imgs/27Test_Target_ShadowInitSeed502_ShadowEpochs40_ShadowDataSeed42_NumShadows40000.pdf}
       &
       \includegraphics[width=0.12\textwidth]{figures/relu_vs_sigmoid/imgs/27Test_ActRelu_ShadowInitSeed502_ShadowEpochs40_ShadowDataSeed42_NumShadows40000.pdf}
       &
       \includegraphics[width=0.12\textwidth]{figures/relu_vs_sigmoid/imgs/27Test_ActSigmoid_ShadowInitSeed502_ShadowEpochs40_ShadowDataSeed42_NumShadows40000.pdf}\\
       & \scriptsize $\mathcal{L}_{Rec}=.0533$ &  \scriptsize $\mathcal{L}_{Rec}=.0167$ \\
    \end{tabular}
        \caption{Original target example and their reconstruction candidates obtained by training \rec on ReLU and Sigmoid activated fully connected models. The quality of reconstructing target examples from Sigmoid activated models is closer to the quality of the original target example.}
    \label{fig:visualization}
\end{figure}

\looseness=-1 We consider the quality of reconstruction in the Sigmoid case to be the benchmark,
and identify intializations for released models that lead to a small or large gap between ReLU and Sigmoid reconstruction losses.
We refer these two groups as \emph{good} and \emph{bad} initializations.
Using these two initialization groups, we study the reconstruction loss of individual examples (i.e., per-sample reconstructions) as well as per-class reconstructions in which the reconstruction losses of data points belonging to the same class are aggregated.

Per-class reconstruction losses in Figure~\ref{tab:perclassrecloss} show that, in general, samples belonging to all classes are harder to reconstruct in the ReLU case than the Sigmoid case.
For \good initializations, ReLU activations still lead to larger reconstruction losses than Sigmoid activations, however, this gap is much larger for \bad initializations. 
Reconstruction losses across classes vary slightly more in the ReLU case than the Sigmoid case, and this variation increases with \bad initializations. Across both initializations and activations, we observe that some data points and classes  are inherently more difficult to reconstruct than others -- possibly because some classes are less complex e.g. the airplane class has many images with blue skies while the truck class images have more intricate backgrounds on average.

\looseness=-1 Figure~\ref{fig:PerSampleReluvsSigmoid} shows the effect of changing the released model activations from Sigmoid to ReLU on per-sample reconstruction losses using both \good and \bad initializations. In general, the histogram of per-sample reconstruction loss in Sigmoid is more condensed than the ReLU ones. 
We plot the histogram across initializations using the same x-axis scale in the first column of Figure~\ref{fig:PerSampleReluvsSigmoid} to highlight that the spread of reconstruction losses on ReLU models drops significantly when transitioning from \bad to \good initializations. 
For  \good initializations there is a large overlap between ReLU and Sigmoid reconstruction losses, which is not the case for \bad initializations.
The second column of Figure~\ref{fig:PerSampleReluvsSigmoid} shows that a few samples are reconstructed better in the ReLU case than the Sigmoid case when we use the \good initialization.


\begin{figure}[t]
    \centering
    \bad initialization
    \includegraphics[width=0.5\textwidth]{figures/relu_vs_sigmoid/per_class/class_wise_badinitialization.pdf}\\
    \good initialization
    \includegraphics[width=0.5\textwidth]{figures/relu_vs_sigmoid/per_class/class_wise_goodinitialization.pdf}\\
    \caption{Impact of the choice of the activation function on the per-class reconstruction loss across two different initializations. Reconstructing target points from models with ReLU activations, independent of the class that they belong to, is less successful than those with Sigmoid activations. }
    \label{tab:perclassrecloss}
\end{figure}


\begin{figure}[t]
    \centering
    \setlength\tabcolsep{1pt}
    \begin{tabular}{cc}
    \multicolumn{2}{c}{\bad initialization}\\
    \includegraphics[width=0.23\textwidth]{figures/relu_vs_sigmoid/per_sample/histogram_badinitialization.pdf}&
    \includegraphics[width=0.23\textwidth]{figures/relu_vs_sigmoid/per_sample/relative_badinitialization.pdf}\\
    \multicolumn{2}{c}{\good initialization}\\
    \includegraphics[width=0.23\textwidth]{figures/relu_vs_sigmoid/per_sample/histogram_goodinitialization.pdf}&
    \includegraphics[width=0.23\textwidth]{figures/relu_vs_sigmoid/per_sample/relative_goodinitialization.pdf}\\
        \end{tabular}
    \caption{Comparing the effect of changing the activation function from Sigmoid to ReLU on the per-sample reconstruction loss using \bad initialization and \good initializations. ReLU activations lead to higher per-sample reconstruction loss compared to Sigmoid activations especially in the \bad initialization case. }
    \label{fig:PerSampleReluvsSigmoid}
\end{figure}




\section{Why is Training Data Reconstruction from ReLU Activated Models Hard?}
\label{sec:why}

\looseness=-1 We theoretically and empirically analyze why ReLU activations lead to higher reconstruction loss and variations across parameter initialization compared to the Sigmoid activation. 


\subsection{Existence of Redundant Model Parameters In Theory}
\label{sec:theory}
Consider a fully connected layer that receives an input {${\x \in \mathbb{R}^{n_x}}$} and outputs the activation {$\mathbf{a} \in \mathbb{R}^{n_h}=R(\mathbf{h})$} which is computed by applying a non-linear activation function $R(\cdot)$ on the pre-activation {${\mathbf{h} \in \mathbb{R}^{n_h} =\W\x+\bi}$}. Parameters $\W \in \mathbb{R}^{n_h \times n_x}$ and bias $\bi \in \mathbb{R}^{n_h}$ are initialized randomly. The parameter matrix $\W$ contains as many rows as the number of neurons at the output of the fully connected layer such that each row $\w^l$ denotes all the edges connecting $\x$ to an output neuron $h^l$.
At each training step $t$, each row $\w^l$ is updated based on the gradient of the loss $\lo$ w.r.t. this row as $\w_{t+1}^l=\w_{t}^l-\text{lr}\frac{\partial \lo}{\partial \w_t^l}$, where $\text{lr}$ is the learning rate.
The gradient is obtained as
\begin{align}
    \frac{\partial \lo}{\partial \w_t^l} &=\frac{\partial \lo}{\partial a_t^l} \cdot \frac{\partial a_i^l}{\partial h_t^l} \cdot \frac{\partial h_t^l}{\partial \w_t^l} \nonumber \\
    &=\frac{\partial \lo}{\partial a_t^l} \cdot R'(h_t^l) \cdot \x
    \enspace,
\end{align}
where $R'$ is the derivative of the activation function $R$.


For the Sigmoid activation function, $R'$ is always non-zero, thus each row stores a copy of $\mathbf{x}$. However, recall that for the ReLU activation we have $R'(h) = 0$ whenever $h \leq 0$, and $R'(h) = 1$ otherwise. This means that

\begin{align}
    \frac{\partial \lo}{\partial \w_t^l} &=
    \begin{cases}
    0 &  \text{if} \quad h_t^l \leq 0\\
    \scale \cdot \x & \text{otherwise}\enspace,
    \end{cases}
    \label{eq:scale}
\end{align}

where $\scale=\frac{\partial \lo}{\partial a_t^l}$.
In particular, when $h_t^l$ is positive, the update step will store a copy of $\x$ (proportional to $\text{lr} \frac{\partial \lo}{\partial a_t^l}$) in $\w_{t+1}^l$, but otherwise the parameter update will be independent of the input $\x$.
In the latter case, the update of row $\w^l$ does not store any information useful to perform a reconstruction attack against target $\x$ -- we call such a row \emph{redundant}. The existence of these redundant rows decreases the quality of reconstruction. In addition to this, the number of redundant parameters varies across different initializations, resulting in high variance.
Next we empirically investigate how these redundant parameters manifest during training of ReLU activated FCNN.




\subsection{Existence of Redundant Model Parameters in Practice}

We study the training dynamics of ReLU activated models versus those of Sigmoid activated models. We focus on the first layer where each row of parameters can store a scaled version of target points depending on the value of $\textit{scale}$ in Equation~\ref{eq:scale}. In our experiments, we efficiently compute the $\scale$ of each row $l$ at each training step $t$ using the gradient of the loss with respect to the bias of $l$-th neuron as:
\begin{align}
    \frac{\partial \lo}{\partial b_t^l} =\frac{\partial \lo}{\partial a_t^l} \cdot \frac{\partial a_t^l}{\partial b_t^l} =\frac{\partial \lo}{\partial a_t^l}=\scale\enspace.
\end{align}

We compute the $\scale$ of each row for all 1,000 target points, and binarize to demarcate which row stores a copy of the target input:
\begin{align}
\textit{B-scale}=
    \begin{cases}
    0 &  \text{if} \quad \scale = 0\\
    1 & \text{otherwise}\enspace,
    \end{cases}
\end{align}
where 0 implies that no information about the target point is stored, while 1 indicates that an exact copy of the target point is stored in that specific row. Figure~\ref{fig:TrainingDynamicReLUvsSigmoidSeeds} shows the histogram of the summation of $\textit{B-scale}$ of all rows per target point (i.e., 
the number of rows that store each target point) over time. Almost\footnote{Figure~\ref{fig:TrainingDynamicReLUvsSigmoidSeeds} shows that none of superparameters in Sigmoid activated model store two (out of 1000) target points. We hypothesize that this is due to the saturation of Sigmoid for these two target points  whose pixel values are mostly zero (visually black).} all the parameters of models with Sigmoid activations store all the target points, while for ReLU activations, some rows store no information about the target training point. 
This effect becomes more severe at later steps, where a larger number of rows store no information about the target point for ReLU models. 
The histogram of the number of rows that store each target point in \good initializations is more condensed than for \bad initializations.
This offers an intuitive explanation for why reconstruction becomes more difficult on ReLU models with \bad initializations:
the number of rows storing target points varies across points and time in the \bad initialization.
That is, for \bad initializations the pattern for redundant rows over training can differ wildly for two different target points, making it more difficult for the \rec to learn.

\begin{figure}[t]
    \centering
    \setlength\tabcolsep{0.5pt}
    \begin{tabular}{ccc}
    \multicolumn{3}{c}{\bad initialization}\\
    \includegraphics[width=0.33\columnwidth]{figures/hist_grad/MLP3/HistGradShadowModels_Itr0_ActRelu_ShadowInitSeed301_ShadowEpochs40_ShadowDataSeed42_NumShadows40000.pdf}& 
\includegraphics[width=0.33\columnwidth]{figures/hist_grad/MLP3/HistGradShadowModels_Itr29_ActRelu_ShadowInitSeed301_ShadowEpochs40_ShadowDataSeed42_NumShadows40000.pdf}& 
    \includegraphics[width=0.33\columnwidth]{figures/hist_grad/MLP3/HistGradShadowModels_Itr39_ActRelu_ShadowInitSeed301_ShadowEpochs40_ShadowDataSeed42_NumShadows40000.pdf}\\
    \multicolumn{3}{c}{\good initialization}\\
    \includegraphics[width=0.33\columnwidth]{figures/hist_grad/MLP3/HistGradShadowModels_Itr0_ActRelu_ShadowInitSeed201_ShadowEpochs40_ShadowDataSeed42_NumShadows40000.pdf}& 
\includegraphics[width=0.33\columnwidth]{figures/hist_grad/MLP3/HistGradShadowModels_Itr29_ActRelu_ShadowInitSeed201_ShadowEpochs40_ShadowDataSeed42_NumShadows40000.pdf}& 
    \includegraphics[width=0.33\columnwidth]{figures/hist_grad/MLP3/HistGradShadowModels_Itr39_ActRelu_ShadowInitSeed201_ShadowEpochs40_ShadowDataSeed42_NumShadows40000.pdf}\\
    \end{tabular}
    \caption{Training dynamics of ReLU activated models versus Sigmoid activated models using different initializations. Each plot shows the histogram of number of rows (S) that store each target point. Not all parameters of ReLU activated models store target points, while all parameters of Sigmoid activated models store information about target points.} 
    \label{fig:TrainingDynamicReLUvsSigmoidSeeds}
\end{figure}










\begin{figure*}[t]
    \centering
    \begin{tabular}{c}
    \includegraphics[width=0.84\textwidth]{figures/LIMEattack.pdf} 
        \end{tabular}
    \caption{An overview of \name that quantifies the contribution of model parameters on reconstructing a target point.}
    \label{fig:LIME_BD}
\end{figure*}
\begin{figure*}[t]
    \centering
    \begin{minipage}[t]{0.63\textwidth}
    \centering
    \setlength\tabcolsep{0.5pt}
        \begin{tabular}{cc}
        \includegraphics[width=0.45\textwidth]{figures/LIME/Coe_binGrad.pdf}&
        \includegraphics[width=0.45\textwidth]{figures/LIME/Coe_scale.pdf}\\
    \end{tabular}
        \caption{Coefficients of the interpretbale model in \name as a function of number of target points points stored in each superparameter across 1,000 released models (left) and \scale of storing target points in each superparameters  after 1 step. \name  coefficients are aligned with training dynamics such that it can detect important and redundant superparameters.} 
\label{fig:explainAttackstep1}
    \end{minipage}\hfill
        \begin{minipage}[t]{0.35\textwidth}
        \begin{tabular}{c}
       \includegraphics[width=0.81\textwidth]{figures/LIME/Coe_Rec.pdf}
       \end{tabular}
        \caption{Coefficients of the interpretable model in \name as a function of reconstructing targets based on each superparameter after 40 steps. \name can detect important and redundant superparameters.}
\label{fig:explainAttackstep40}
    \end{minipage}
\end{figure*}
\section{\name: finding parameters that store target examples}



\subsection{Methodology}
With a view to identifying redundant parameters that are not useful for a reconstruction attack without accessing intermediate updates from the released model, we introduce \name. Our proposed \name method is a black-box approach that explains which parameters of the released model store target points. 
\name quantifies the contribution of parameters on reconstructing each target point by extending techniques from explainable ML. In particular, we extend the approach of Local Interpretable Model-Agnostic Explanations (LIME~\citep{ribeiro2016should}) to
operate in parameter space. Figure~\ref{fig:LIME_BD} shows an overview of \name, which consists of three sequential phases: 1) Grouping model parameters into sets, which we term \emph{superparameters}; 2) Building different variants of released model based on the presence or absence of each superparameter; 3) Training an interpretable model specifying the importance of each superparameter. Next, we describe each phase of \name in detail. 

\myparagraph{Phase 1: Grouping parameters into superparameters.} Randomly changing an individual parameter of the model cannot change the output of \rec as models often contains a large number of parameters. We propose to group the model parameters into superparameters such that changing individual superparameters significantly changes the quality of the reconstructed target point output by \rec if the superparameter stores the target point. For each layer $i$, we group the parameters into $S$ superparameters $\{\mathbf{w}^{(i,s)}\}_{s=1}^S$ based on the destination nodes of this layer, where $S$ is the total number of destination nodes. Each superparameter $\mathbf{w}^{(i,s)}$ represents the $s$-th row of model parameters of the $i$-th layer which might store a scaled copy of the input depending on the sign of the signal passed to ReLU activations in the forward pass while training the model (see Section~\ref{sec:why}).


\myparagraph{Phase 2: Building different variants of the released model based on the presence or absence of each superparameter.} In order to determine whether a particular superparameter stores the target point, we capture the effect of masking out the superparameter on the \rec responses. To do that, we create $M$ perturbed models $\{\W'_m\}_{m=1}^M$ by randomly selecting several superparameters identified by 1s in each binary mask $\{\mathbf{b}_m \in \{0,1\}^S\}_{m=1}^M$ and replacing the value of the rest of superparameters with values used to initialise parameters. Reverting back values of a superparameter into its initial values removes the effect of all updates done during training, thus removing any information about target points that might have been stored in that superparameter. 
In particular, the value of each superparameter within $\W'_m$ is set as follows:
\begin{align}
\mathbf{w'}_m^{(s)}=
    \begin{cases}
    \mathbf{w}^{(s)} &  \text{if} \quad b_m^{(s)}\neq 0 \\
    \mathbf{w}_{\text{init}}^{(s)} & \text{otherwise}\enspace.
    \end{cases}
\end{align}


\myparagraph{Phase 3: Training an interpretable model specifying the importance of each superparameter.} We aim to create an interpretable model that can capture and explain the \rec responses to the present or absence of each individual superparameter. We consider a 1-dimensional output linear regression model whose coefficient explains the importance of each individual input feature on its 1 dimensional output. We train the linear regression model $\text{Regressor}: B \rightarrow L$ where the input is a  binary vector indicating the presence or absence of each superparameter, and the output is MSE loss between the reconstructed target example and the target example. In particular, we create the input and output of the regression model as follows: 
\begin{itemize}
    \item Output: we query \rec to obtain reconstructed images $\{\hat{\mathbf{z}}_m\}_{m=1}^{M}$ on these $M$ perturbed released models. To create 1 dimensional output for the interpretable linear regression, we compute the MSE between each $\hat{\mathbf{z}}_m$ and the target example $\mathbf{z}$ as {${\{l_m=\mathcal{L}_{\text{MSE}}(\hat{\mathbf{z}}_m,\mathbf{z})\}_{m=1}^M}$}.
    \item Input: binary masks, $\{\mathbf{b}_m \in \{0,1\}^S\}_{m=1}^M$, indicating the presence or absence of superparameter.
\end{itemize}

 Once the linear regression model is trained, we have a coefficient per superparameter that identifies the effect of each individual superparameter on the quality of the reconstruction. We interpret the coefficients of the trained linear regression model as follows. The presence of superparameters with negative coefficients can decrease the MSE reconstruction loss, thus improving the quality of the reconstruction. However, the presence of superparameters with positive coefficients can increase the MSE reconstruction loss, thus damaging the quality of the reconstruction. Therefore, we identify superparameters with negative (or small) coefficients as important superparameters for reconstructing target points.




\subsection{Validation} 
\looseness=-1 We validate the performance of \name in estimating the importance of parameters. First, we analyze the performance of \name on released models trained with a single update step in order to evaluate how well \name detects superparameters with non-zero gradients. Figure~\ref{fig:explainAttackstep1} illustrates the coefficients of \name as a function of both \scale of each superparameter (see Equation~\ref{eq:scale}) and number of target points stored in each superparameter. The smaller the value of the \name coefficient, the more important the superparameter. 
As the \scale (or number of stored target examples) increases, the \name coefficient decreases.
For example, the \name coefficient of $S3$ that stores more than $90\%$ of target points is $(-0.03)$ while the coefficient of $S6$ that stores less than $25\%$ of target points is $(+0.01)$.


Second, we evaluate the behaviour of \name on released models trained for more than one step. In Figure~\ref{fig:explainAttackstep40}, we again observe that superparameters with negative coefficients are those superparameters with the best performance when we train \rec on each individual superparameter: the smaller the \name coefficient, the better the reconstruction loss. 

As \name coefficients are aligned with the training dynamic of released models, they can be used to improve the success of \rec by shifting its focus towards important superparameters and ignoring redundant superparameters.




\subsection{Application}

\begin{algorithm2e}[t!]
\algsetup{linenosize=\tiny}
\small
\DontPrintSemicolon
\SetKwComment{Comment}{{\scriptsize$\triangleright$\ }}{}
\caption{\attack}
        \KwIn{Fixed set $D_{-}$, $K$ public target examples $\{\bar{z}_k\}_{k=1}^K$, Shadow model training Algorithm $A(\cdot)$, \rec training algorithm $B(\cdot)$. } 
        \KwOut{\name-guided \rec.}
\BlankLine
    \begin{algorithmic}[1]
     \FORALL{{$k \in \text{range}(K)$}} 
      \STATE $\bar{\W}_k \leftarrow{} A(D_{-}\cup\{\bar{z}_k\})$  \Comment*[r]{{\scriptsize Train $K$ shadow models }} 
    \ENDFOR 
    \STATE $\boldsymbol\phi \leftarrow B({\{\bar{\W}_k},\bar{z}_k\}_{k=1}^K)$  \Comment*[r]{{\scriptsize Train \rec }} 
    \STATE $ I \leftarrow \name(\boldsymbol\phi)$  \Comment*[r]{{\scriptsize Apply \name to identify the importance of superparameters }}
    \FORALL{{$k \in \text{range}(K)$}} 
       \STATE $ \tilde{\W}_k \leftarrow \text{Selector}(\bar{\W}_k,I)$  \Comment*[r]{{\scriptsize Select only important superparameters}}
    \ENDFOR
    \STATE $\tilde{\boldsymbol\phi} \leftarrow B({\{\tilde{\W}_k},\bar{z}_k\}_{k=1}^K)$  \Comment*[r]{{\scriptsize Train \rec on the selected important superparameters }}
    \STATE \textbf{return} $ \tilde{\boldsymbol\phi} $ \Comment*[r]{{\scriptsize \name-guided \rec}}
\end{algorithmic}
\label{alg:ours}
\end{algorithm2e}


We aim to improve the quality of reconstruction of training examples obtained by \rec, based on insights provided by \name regarding where and how target examples are stored in ReLU activated models. 
To do that, we design our \attack attack in which \rec are trained only on those superparameters that have \name negative coefficients (see Algorithm~\ref{alg:ours}). 
Table~\ref{tab:ImprovedAttack} and Figure~\ref{fig:visualizationimp} show the effect of \name on the reconstruction success of current attack. Results show that \name can guide and improve the performance of the current \rec. 


\begin{table}[t]    \caption{\name improves the loss of reconstructing target points from models with ReLU activation function.}
    \centering
    \setlength\tabcolsep{3pt}
    \begin{tabular}{|l|cccc|}
    \Xhline{3\arrayrulewidth} 
Approach &  run1 & run2 & run3 & run4 \\
   \hline
    \rec & .2040 & .2705 & .0908 & .1315\\
    \rec+\name & \textbf{.1738} & \textbf{.2385} & \textbf{.0730} & \textbf{.1158}\\
\Xhline{3\arrayrulewidth} 
    \end{tabular}
    \label{tab:ImprovedAttack}
\end{table}


\begin{figure}[t]
    \begin{tabular}{ccc}
    \scriptsize Target & \scriptsize \rec &  \scriptsize \attack \\
      \includegraphics[width=0.12\textwidth]{figures/relu_vs_sigmoid/imgs/ours/8LIME_Test_Target_ShadowInitSeed502_ShadowEpochs40_ShadowDataSeed42_NumShadows40000.pdf}
       &
       \includegraphics[width=0.12\textwidth]{figures/relu_vs_sigmoid/imgs/ours/8Test_ActRelu_ShadowInitSeed502_ShadowEpochs40_ShadowDataSeed42_NumShadows40000.pdf}
       &
       \includegraphics[width=0.12\textwidth]{figures/relu_vs_sigmoid/imgs/ours/8Mnemonist_Test_ActReLU_ShadowInitSeed502_ShadowEpochs40_ShadowDataSeed42_NumShadows40000.pdf}\\
       & \scriptsize $\mathcal{L}_{Rec}=.1161$ &  \scriptsize $\mathcal{L}_{Rec}=.0671$ \\
       \includegraphics[width=0.12\textwidth]{figures/relu_vs_sigmoid/imgs/ours/19LIME_Test_Target_ShadowInitSeed502_ShadowEpochs40_ShadowDataSeed42_NumShadows40000.pdf}
       &
       \includegraphics[width=0.12\textwidth]{figures/relu_vs_sigmoid/imgs/ours/19Test_ActRelu_ShadowInitSeed502_ShadowEpochs40_ShadowDataSeed42_NumShadows40000.pdf}
       &
       \includegraphics[width=0.12\textwidth]{figures/relu_vs_sigmoid/imgs/ours/19Mnemonist_Test_ActReLU_ShadowInitSeed502_ShadowEpochs40_ShadowDataSeed42_NumShadows40000.pdf}\\
       & \scriptsize $\mathcal{L}_{Rec}=.3042$ &  \scriptsize $\mathcal{L}_{Rec}=.0632$ \\
      \includegraphics[width=0.12\textwidth]{figures/relu_vs_sigmoid/imgs/ours/27Test_Target_ShadowInitSeed502_ShadowEpochs40_ShadowDataSeed42_NumShadows40000.pdf}
       &
       \includegraphics[width=0.12\textwidth]{figures/relu_vs_sigmoid/imgs/ours/27Test_ActRelu_ShadowInitSeed502_ShadowEpochs40_ShadowDataSeed42_NumShadows40000.pdf}
       &
       \includegraphics[width=0.12\textwidth]{figures/relu_vs_sigmoid/imgs/ours/27Mnemonist_Test_ActReLU_ShadowInitSeed502_ShadowEpochs40_ShadowDataSeed42_NumShadows40000.pdf}\\
       & \scriptsize $\mathcal{L}_{Rec}=.0533$ &  \scriptsize $\mathcal{L}_{Rec}=.0414$ \\
\includegraphics[width=0.12\textwidth]{figures/relu_vs_sigmoid/imgs/ours/34Test_Target_ShadowInitSeed502_ShadowEpochs40_ShadowDataSeed42_NumShadows40000.pdf}
       &
       \includegraphics[width=0.12\textwidth]{figures/relu_vs_sigmoid/imgs/ours/34Test_ActRelu_ShadowInitSeed502_ShadowEpochs40_ShadowDataSeed42_NumShadows40000.pdf}
       &
       \includegraphics[width=0.12\textwidth]{figures/relu_vs_sigmoid/imgs/ours/34Mnemonist_Test_ActReLU_ShadowInitSeed502_ShadowEpochs40_ShadowDataSeed42_NumShadows40000.pdf}\\
       & \scriptsize $\mathcal{L}_{Rec}=.0772$ &  \scriptsize $\mathcal{L}_{Rec}=.0458$ \\

    \end{tabular}
        \caption{Target examples and their reconstruction candidates obtained from  ReLU activated fully connected models using \rec and \attack. \name helps \rec to improve the quality of reconstructions.}
    \label{fig:visualizationimp}
\end{figure}




\section{Discussion and Future Work}
\label{sec:discussion}
We provided theoretical and empirical analyses investigating the effect of the type of non-linearity used in the model specification on the quality of reconstructing examples from the model parameters. We proposed a theoretically motivated explanation technique, \name, to locate model parameters that memorize training examples, thus improving the quality of reconstructions of current attack on small MLPs with ReLU activations. Below, we discuss some limitations of our approach and promising directions for future work.

\myparagraph{Model architecture.} Fully connected neural networks are the focus of many current training data reconstruction attacks~\citep{fowl2021robbing,boenisch2021curious,haim2022reconstructing}, and yet is it not fully understood what training conditions lead to successful reconstruction attacks in these networks. 
Our work investigates the necessary properties of model specification in fully connected neural networks that enable better reconstruction attacks. However, an interesting future direction is to extend our proposed explanation technique to other architectures such as Convolutional Neural Networks (CNNs) with ReLU activations. 
If the adversary has the ability to choose or design the model architecture~\citep{fowl2021robbing,boenisch2021curious}, then Section~\ref{sec:theory} shows that by using a fully connected layer in the first layer, reconstruction attacks become easier. This is because complete copies of targets can be stored in first layer updates (see Equation~\ref{eq:scale}).
Indeed, our \name approach can be used in this setting to help identify the parameters of linear layers that store these examples. 

\myparagraph{Extending our approach to CNNs in a benign setting.} In more benign settings, where the adversary does not have the capability to manipulate or choose the model architecture, to apply our approach to standard CNN architectures, our theoretical analysis will need to be extended to establish how data points are memorised in each convolutional layer. CNNs typically have fully connected layers at the end: the last convolutional layer outputs embeddings which are the input of the first fully connected layer located after this last convolutional layer. Therefore, our approach can extend to CNNs using an embedding reconstruction attack followed by an embedding to data mapping as follows; i) dropping all convolutional layers and perform the attack on the first fully connected layer in which our fully connected based method and analyses can be used; ii) training \rec such that it receives the parameters of the first fully connected layer and tries to reconstruct its corresponding embeddings and iii) training a network that maps the embedding to data. 

\myparagraph{Efficiency.}
The reconstruction attacks described in this work inherit the computation bottleneck of the attack described in ~\citep{informed_adversary}, as (1) thousands of shadow models need to be trained, and (2) the \rec uses the parameters of the shadow model as input, which can become extremely large for large models and high dimensional datasets. Reducing the number of shadow models that need to be trained, or reducing the number of parameters that need to be passed as inputs to the \rec can improve the efficiency of the attack. In turn, this will allow us to scale reconstruction attacks to larger datasets and models. For example, our experiments show that we don't need the full set of parameters to perform good reconstruction attacks, which opens the door for future work on identifying and reducing the number of parameters we need to use as inputs.

\myparagraph{The quality of the data reconstruction versus privacy-accuracy trade-offs in DPSGD.} \citet{papernot2021tempered} has demonstrated that replacing ReLU activation functions with smooth activation functions can improve the trade-offs between privacy and accuracy of Differentially Private Stochastic Gradient Descent (DPSGD). However, we demonstrate that the informed adversary proposed by ~\citet{informed_adversary} cannot successfully reconstruct training examples from ReLU activated models. This conflicting evidence between the impact of the activation function on the quality of the data reconstruction versus privacy-accuracy trade-offs in DPSGD presents a promising direction for future work.

\section*{Acknowledgments}
The authors would like to thank David Stutz for feedback on an earlier version of this manuscript. Adrian Weller acknowledges support from EPSRC grant EP/V056883/1, a Turing AI Fellowship under EP/V025279/1, and the Leverhulme Trust via CFI. Code will be made available.



% References
\bibliography{shahin-shamsabadi_458}
\end{document}
