\section{Brain Age Model, Training and Dataset Details}\label{sec:appendix_training_data_details}





In both federated and centralized setups, we used  T1 structural MRI scans of healthy subjects from the UK Biobank dataset~\cite{ukbb} for brain age prediction. All the scans were preprocessed with the same technique as \citet{lam2020accurate}, resulting in final images with dimensions $91 \times 109 \times 91$. Here we briefly describe the relevant details. We refer the reader to \citet{gupta2021improved} and \citet{stripelis2021scaling} for full details.

\subsection{Centralized Training Setup}\label{subsec:centralized_training_details}
\begin{table}[!htbp]
    \centering
    \begin{tabular}{l c c c}
        \toprule
        Model & Train & Test & Validation\\
         \cmidrule(r){1-1} \cmidrule(lr){2-2} \cmidrule(lr){3-3} \cmidrule(lr){4-4}
        \texttt{3D-CNN}        & 1.39  & 3.13 & 3.09  \\
        \texttt{2D-slice-mean} & 0.77  & 2.88 & 2.92  \\
        \bottomrule
    \end{tabular}
    \caption{Mean absolute errors (year) for train, test and validation set in the centralized setup.}
    \label{tab:centralized_setup_performance}
\end{table}


To simulate attacks on centrally trained deep neural network models, we adopted the pretrained models from~\citet{gupta2021improved}. The authors selected a subset of healthy 10,446 subjects from 16,356 subjects in the UK Biobank dataset to create a training, validation, and test set of size 7,312, 2,194, and 940, respectively, with a mean chronological age of 62.6 and standard deviation of 7.4 years. \citet{gupta2021improved} proposed novel 2D-slice-based architectures to improve brain age prediction. Their architectures used 2D convolutions to encode the slices along the sagittal axis and aggregated the resultant embeddings through permutation invariant operations. In our work, we use the \texttt{2D-slice-mean} model, which demonstrated the best performance in their study, and a conventional \texttt{3D-CNN} model, which is often used to process MRI scans~\cite{peng2021accurate,cole2017predicting}. The architecture diagram of both the models are shown in \figureref{fig:arch_brain_age} and discussed in \sectionref{subsec:architecture_diagrams}.

For the brain age problem, the performance is measured as the mean absolute error (MAE) between the predicted and true age on the held-out test set. In \citet{gupta2021improved}, the  models were trained for 100 epochs, and the best model was selected based on the performance on the validation set.
The membership inference attacks that we investigate in this work are evaluated over the models produced at the end of the $100^{th}$ epoch. \rebuttal{\tableref{tab:centralized_setup_performance} shows performance of these models, i.e., MAE on train, test and validation sets at the end of $100^{th}$ epoch.}





\subsection{Federated Training Setup}\label{subsec:federated_training_details}
\begin{figure}[htpb]
    \centering
    \subfigure[%
        Uniform \& IID\label{subfig:UKBB_AgeBuckets_Uniform_IID}
    ]{%
        \includegraphics[width=0.3\textwidth]{figures/federated_training_dist/AgeBuckets.brainage.cnn5.federation.8FastLearners_atBDNF.SyncFedAvg.uniform_datasize_iid_v3.png}
    }
    \subfigure[%
        Uniform \& non-IID\label{subfig:UKBB_AgeBuckets_Uniform_NonIID}
    ]{
        \includegraphics[width=0.3\textwidth]{figures/federated_training_dist/AgeBuckets.brainage.cnn5.federation.8FastLearners_atBDNF.SyncFedAvg.uniform_datasize_noniid_v3.png}
    }
    \subfigure[%
        Skewed \& non-IID\label{subfig:UKBB_AgeBuckets_Skewed_NonIID}
    ]{
        \includegraphics[width=0.3\textwidth]{figures/federated_training_dist/AgeBuckets.brainage.cnn5.federation.8FastLearners_atBDNF.SyncFedAvg.skewed_datasize_noniid_v3.png}
    }
    \subfigure[%
        Uniform \& IID\label{subfig:UKBB_AgeDistribution_Uniform_IID}
    ]{
        \centering\includegraphics[width=0.3\textwidth]{figures/federated_training_dist/AgeDistributions.brainage.cnn5.federation.8FastLearners_atBDNF.SyncFedAvg.uniform_datasize_iid.png}
    }
    \subfigure[%
        Uniform \& non-IID \label{subfig:UKBB_AgeDistribution_Uniform_NonIID}
    ]{
    \centering\includegraphics[width=0.3\textwidth]{figures/federated_training_dist/AgeDistributions.brainage.cnn5.federation.8FastLearners_atBDNF.SyncFedAvg.uniform_datasize_noniid.png}
    }
    \subfigure[%
        Skewed \& non-IID\label{subfig:UKBB_AgeDistribution_Skewed_NonIID}
    ]{
        \centering\includegraphics[width=0.3\textwidth]{figures/federated_training_dist/AgeDistributions.brainage.cnn5.federation.8FastLearners_atBDNF.SyncFedAvg.skewed_datasize_noniid_v2.png}
    }
    \caption{%
    The UK Biobank data distribution across 8 learners for the three federated learning environments. Figures \subfigref{subfig:UKBB_AgeBuckets_Uniform_IID,subfig:UKBB_AgeBuckets_Uniform_NonIID,subfig:UKBB_AgeBuckets_Skewed_NonIID} present the amount of data per age range bucket (i.e., $[39-50),[50-60),[60-70),[70-80)$) per learner. Figures \subfigref{subfig:UKBB_AgeDistribution_Uniform_IID,subfig:UKBB_AgeDistribution_Uniform_NonIID,subfig:UKBB_AgeDistribution_Skewed_NonIID} present the age range distribution (mean $\mu$ and standard deviation $\sigma$) per learner. Figures are reproduced from \citet{stripelis2021scaling}.
    }%
    \label{fig:ukbb_federated_data_distribution}
\end{figure}
\begin{figure}[htpb]
    \centering
    \includegraphics[width=0.5\textwidth]{figures/federated_training_perf/2D_Model_convergence_with_all_federated_learning_distributions.pdf}
    \caption{Learning curve (test performance) for \texttt{2D-slice-mean} model across different federated learning environments. The model is evaluated at each federation round for the brain age prediction problem. The more non-IID and unbalanced the data distribution is, the harder it is for the federation model to converge.}\label{fig:2DModel_federation_convergence}
\end{figure}



\begin{table}[!htbp]
    \centering
    \begin{tabular}{l c c c c c c}
        \toprule
        \multirow{2}{*}{Model} & \multicolumn{2}{c}{Uniform \& IID}  & \multicolumn{2}{c}{Uniform \& non-IID}& \multicolumn{2}{c}{Skewed  \& non-IID} \\
         \cmidrule(lr){2-3} \cmidrule(lr){4-5} \cmidrule(lr){6-7}
        & Train & Test & Train & Test & Train & Test \\
        \cmidrule(r){1-1} \cmidrule(lr){2-2} \cmidrule(lr){3-3} \cmidrule(lr){4-4} \cmidrule(lr){5-5} \cmidrule(lr){6-6} \cmidrule(lr){7-7}
        \texttt{3D-CNN} & 2.16  & 3.01 & 3.41 & 3.81 & 2.83 &  3.47 \\
        \texttt{2D-slice-mean} & 1.81  &  2.76 & 2.40 & 2.98  & 2.42 &  3.10 \\
        \bottomrule
    \end{tabular}
    \caption{Mean absolute errors (year) for training, and testing set for different environments in the federated setup.}
    \label{tab:federated_setup_performance}
\end{table}
To simulate membership inference attacks on models trained in federated learning environment, we used the pretrained models, dataset, and training setup of~\citet{stripelis2021scaling}. In particular, the investigated learning environments consist of 8~learners with homogeneous computational capabilities (8~GeForce GTX 1080 Ti graphics cards with 10~GB RAM each) and heterogeneous local data distributions. With respect to the UK Biobank dataset, the 10,446 subject records were split into 8,356 train and 2,090 test samples. In particular, three representative federated learning environments were generated with diverse amounts of records (i.e., Uniform and Skewed) and subject age range distribution across learners (i.e., IID and non-IID). All these environments are presented in \figureref{fig:ukbb_federated_data_distribution}.

To perform our attacks, we considered the community models received by each learner in all federation rounds. Specifically, we used the pretrained \texttt{3D-CNN} community models from  \citet{stripelis2021scaling}, which were trained for 25 federation rounds, and every learner performed local updates on the received community model parameters for $4$ epochs in each round. To train the \texttt{2D-slice-mean} federation model, we emulate a similar training setup for 40 federation rounds. For both federated models, the solver of the local objective is SGD, the batch size is equal to $1$, the learning rate is equal to $5e^{-5}$ and every learner used all its local data during training, without reserving any samples for validation. Finally, at every federation round all local models are aggregated using the Federated Average (FedAvg) aggregation scheme~\cite{mcmahan2017communication}. The convergence of the \texttt{2D-slice-mean} federated model for the three federated learning environments is shown in \figureref{fig:2DModel_federation_convergence} \rebuttal{and the performance of the final community models for each learning environment is summarized in \tableref{tab:federated_setup_performance}}.






\subsection{\texttt{3D-CNN} and \texttt{2D-slice-mean} model architecture}
\label{subsec:architecture_diagrams}
\begin{figure}
    \centering
    \subfigure[3D-CNN]{%
        \label{fig:3d_cnn}
        \includegraphics[width=0.4\textwidth]{figures/arch/3d-cnn.pdf}
    }
    \subfigure[2D-slice-mean]{%
        \label{fig:2d_cnn}
        \includegraphics[width=0.4\textwidth]{figures/arch/2d-cnn.pdf}
    }
    \caption{Neural network architectures for brain age prediction. Gray blocks indicate trainable modules and non-parametric operations are indicated on the arrows. Groups of parameters are labelled for the ease of reference. }
    \label{fig:arch_brain_age}
\end{figure}

\paragraph{\texttt{3D-CNN}:}
\figureref{fig:3d_cnn} describes the architecture for the \texttt{3D-CNN} model.  \texttt{3D-CNN} uses 5 convolutional blocks consisting of 3D-convolution layers with 32, 64, 128, 256 and 256 filters. Each convolutional layer is followed by 3D max-pooling, 3D instance norm and \texttt{ReLU} non-linearity operations. The resulting activations from these are passed through a 64 filter convolutional layer of kernel size 1, average pooled and passed through another 3D-convolutional layer of kernel size 1 to produce the 1 dimensional brain age output.

\paragraph{\texttt{2D-slice-mean}:}
\figureref{fig:2d_cnn} describes the architecture of the \texttt{2d-slice-mean} models.
This architecture encodes each slice along the sagittal dimensional using a slice encoder. The slice encoder is similar to the \texttt{3D-CNN} model but uses the 2D version of all the operations. Ultimately, all the slices are projected to a 32-dimensional embedding. The slice-mean operation aggregates these 32-dimensional embeddings via mean operation, which are  then passed through feed-forward layers to output the brain age.
