\section{Related Work \& Background}\label{sec:prelim}


\subsection{BrainAGE Problem}\label{subsec:sub_brain_age}

\emph{Brain age} is an estimate of a person's age from a structural MRI scan of their brain. The difference between a person's true chronological age and the predicted age is a useful biomarker for early detection of various neurological diseases~\cite{TenYearsBrainAge} and the problem of estimating this difference is defined as the \text{Brain Age Gap Estimation} (BrainAGE) problem. Brain age prediction models are trained on brain MRIs of healthy subjects to predict the chronological age. A higher gap between predicted and chronological age is often considered an indicator of accelerated aging in the subject, which may be a prodrome for neurological diseases.
To predict age from raw 3D-MRI scans, many recent papers have proposed using deep learning~\cite{FENG202015,gupta2021improved,stripelis2021scaling,peng2021accurate,MRISignBrainAge,lam2020accurate}.
To simulate attacks on models trained centrally and distributively, we employ trained networks and training setups recently proposed in~\citet{gupta2021improved} and~\citet{stripelis2021scaling}, respectively. Although there is some controversy over the interpretation of BrainAGE~\cite{butler2020statistical,vidal2021brain}, we emphasize that we are only using BrainAGE as a representative problem in neuroimaging  that benefits from deep learning.


\subsection{Federated Learning}\label{sec:sub_federated_learning}

In traditional machine learning pipelines, data originating from multiple data sources must be aggregated at a central repository for further processing and analysis.
Such an aggregation step may incur privacy vulnerabilities or violate regulatory constraints and data sharing laws, making data sharing across organizations prohibitive. To address this limitation, \emph{Federated Learning} was recently proposed as a distributed machine learning paradigm that allows institutions to collaboratively train machine learning models by relaxing the need to share private data and instead push the model training locally at each data source~\cite{mcmahan2017communication,yang2019federated,MAL-083}.
Even though Federated Learning was originally developed for mobile and edge devices, it is increasingly applied in biomedical and healthcare domains due to its inherent privacy advantages~\cite{lee2018privacy,sheller2018multi,silva2019federated,rieke2020future,silva2020fed}.

Depending on the communication characteristics between the participating sources, different federated learning topologies can be discerned~\cite{yang2019federated,bonawitz2019towards,rieke2020future, bellavista2021decentralised} --- \textit{star} and \textit{peer-to-peer} being the most prominent.
In a star topology~\cite{sheller2018multi,li2019privacy,li2020multi,stripelis2021scaling}, the execution and training coordination across  sources is realized by a trusted centralized entity, the \emph{federation controller}, which is responsible for shipping the global or \emph{community model} to participating sites and aggregating  the local models.
In peer-to-peer~\cite{roy2019braintorrent} topologies, the participating sites communicate directly with each other without requiring a centralized controller.
We focus on the star federated learning topology.


In principle, at the beginning of the federation training, every participating data source or \emph{learner} receives the community model from the federation controller, trains the model independently on its local data for an assigned number of iterations, and sends the locally trained parameters to the controller.
The controller computes the new {community model} by aggregating the learners' parameters and sends it back to the learners to continue training.
We refer to this synchronization point as a \emph{federation round}.
After repeating multiple federation rounds, the jointly learned {community model} is produced as the final output.


\subsection{Membership Inference Attacks}\label{sec:sub_membership_attacks}

Membership inference attacks are one of the most popular attacks to evaluate privacy leakage in practice~\cite{jayaraman2019evaluating}. The malicious use of trained models to infer which subjects participated in the training set by having access to some or all attributes of the subject is termed as \textit{membership inference attack}~\cite{shokri2017,nasr2019}.
These attacks aim to infer if a record (a person's MRI scan in our case) was used to train the model, revealing information about the subject's participation in the study, which could have legal implications.
These attacks are often distinguished by the access to the information that the adversary has~\cite{nasr2019}. Most successful membership inference attacks in the deep neural network literature require access to some parts of the training data or at least some samples from the training data distribution~\cite{salem2019ml,pyrgelis2017knock,truex2018towards}. \emph{White-box attacks} assume that the attacker is also aware of the training procedure and has access to the trained model parameters, whereas \emph{Black-box attacks} only assume unlimited access to an API that provides the output of the model~\cite{leino2020stolen,nasr2019}.

Creating efficient membership inference attacks with minimal assumptions and information is an active area of research~\cite{choo2020label, jayaraman2020revisiting,song2020systematic}. However, our work is focused on demonstrating the vulnerability of deep neural networks to membership inference attacks in the federated as well as non-federated setup. Therefore, we make  straightforward assumptions and assume somewhat lenient access to information. Our attack models are inspired by~\citet{nasr2019,shokri2017}, and we use similar features such as gradients of parameters, activations, predictions, and labels to simulate membership inference attacks.
In particular, we learn deep binary classifiers to distinguish training samples from unseen samples using these features.



In the case of federated learning, each learner receives model parameters and has some private training data. Thus, any learner is capable of launching white-box attacks. Moreover, in this scenario, the learner has access to the community models received at each federation round.
When simulating membership attacks on federated models, we simulate attacks from the learners' perspective by training on learners' private data and the task is to identify other learners' subjects. In the case of models trained via centralized training, we assume that the adversary can access some public training and test samples. We simulate both white-box and black-box attacks in this case.












