\section{Attack Architecture and Training Details}\label{sec:attack_arch}
\subsection{Attack Classifier Parametrization}
We train deep binary classifiers that take different features as input and output the probability of the sample being in the model's train set or not.
We presented the importance of different features derived from a sample and the trained model for membership inference attacks in \sectionref{subsec:centralized_result}. In the case of a black-box attack, the attacker can only use the model's output. In contrast, in the case of white-box attacks, the attacker may also exploit the knowledge of the model's internal working. We have used gradient and activation information to simulate the attacks.

We repurpose the model's architecture to create a binary classifier for preliminary experiments on using activations for attacks. For example, in \figureref{fig:3d_cnn}, when simulating an attack that use activations from second hidden layer, i.e, after \texttt{conv 2} layer, we used a classifier that had layers from \texttt{conv 3} to \texttt{output}. However, as discussed in \sectionref{subsec:centralized_result}, the activations are not very useful features for membership attacks, and we did not do further experiments with activations.

To compute membership inference attacks using only the error feature, a 1D feature, we have used a random forest classifier. For other features, i.e., prediction, labels, gradients, and gradient magnitudes, we have used a generic setup where each feature is embedded to a 64-dimensional embedding using their respective encoders. The embeddings are then concatenated and passed through a dropout layer and a linear layer to output the logit. Below we describe the architecture of the encoder for each feature. We do not do an excessive architecture search but observed that the results are not very sensitive to the specific encoder architecture.

\newcommand{\relu}{\texttt{ReLU} }
\begin{itemize}
    \item
    \textbf{Prediction and label:}
    Prediction and label form a two-dimensional continuous feature. To create the embeddings, these are processed via a linear layer and \relu non-linearity.

    \item
    \textbf{Gradient magnitudes:}
    We use parameter-gradient magnitudes of each layer as features resulting in a 14 dimensional feature for \texttt{3D-CNN} and an 18 dimensional feature vector for \texttt{2D-slice-mean} model. These are processed via a linear layer and \relu non-linearity to generate the embedding.

    \item
    \textbf{\texttt{conv 1} gradients:}
    The size of \texttt{conv 1} gradient feature is 288 ($3\times 3 \times  1\times 32 $) and 864 ($3 \times 3 \times 3 \times1\times 32$) for \texttt{2D-slice-mean} and \texttt{3D-CNN}. We project this feature vector to the desired embeddings size (64) by using a linear layer followed by \relu non-linearity.

    \item
    \textbf{\texttt{conv 6} gradients:}
    For \texttt{3D-CNN}, the feature dimension is $1\times 1\times 1\times 256 \times 64$. We reshape it to $1\times 256\times 64$ and then process it through three convolutional blocks consisting of a 2D-convolution layer, max-pool and \relu non-linearity with 64, 64, and 16 output filters. Finally, we pass the resulting activation of size $16\times6\times6$ through a linear layer and \relu non-linearity to get the desired 64-dimensional embedding. The convolution kernel sizes were $5\times 5$, $4\times 2$, and $4\times 3$ and the max-pool kernel sizes were   $4\times 2$, $4\times 2$, and $2\times 2$.

    For \texttt{2D-slice-mean}, the feature dimension is $1\times 1\times 256 \times 64$. We reshape it to $64\times 256$ and process it through three convolutional blocks consisting of  a 1D-convolution layer,  max-pool, and \relu non-linearity  with 128 output filters in each layer. Finally, we process the resulting activations of size $128\times14$ through a linear layer to get the embedding. The convolution kernel sizes were $5$, $4$, and $3$. The 1D-max-pool kernel sizes were  $4$, $2$, and $2$.

    \item
    \textbf{\texttt{output} gradients:}
    This layer has different number of parameters for both the models, and so we used different encoders. For \texttt{2D-slice-mean} model, two final feed-forward layers are considered as \texttt{output} layers.

    For \texttt{3D-CNN}, the feature dimension is $1\times 1\times 1\times 64 \times 1$. It is reshaped to a 64-dimensional vector and passed through the linear layer and \relu non-linearity to get the embedding.

    For \texttt{2D-slice-mean}, we consider two final feed-forward layers as the \texttt{output} layer, one of the layers has dimensions $64\times 1$ and is encoded similar to \texttt{3D-CNN}'s \texttt{output} layer. The other feed-forward layer parameters are $32\times 64$. We process it through three convolutional blocks consisting of a 1D-convolution layer,  max-pool, and \relu non-linearity  with 64 output filters in each layer. Finally, we process the resulting activation of size $64\times4$ through a linear layer to get the 64-dimensional embedding. All the convolution kernel sizes were set to $3$ and 1D-max-pool kernel sizes were $2$.
\end{itemize}

\paragraph{Note:}When using features from multiple trained models for attack (e.g., in case of federated training), we compute the logits using the deep classifiers described above and use the average logit to compute the probability. The classifier parameters are shared across features from different models. The main intuition to consider averaging is that averaging in $\log$ space would mean considering  prediction from each trained model's feature independently.

\subsection{Training}
To train attack models, we used \texttt{Adam} optimizer, a batch size of 64 and  $1e^{-3}$ learning rate. We trained the models for a maximum of 100 epochs with a patience of 20 epochs, and chose the best model by performance on the validation set, created by an $80-20$ split of the training data. Training data creation for the attack models is described in \sectionref{subsec:attack_setup}.