\section{Introduction}\label{sec:intro}
Machine learning's endless appetite for data is increasingly in tension with the desire for data privacy. Privacy is a highly significant concern in medical research fields such as neuroimaging, where information leakage may have legal implications or severe consequences on individuals' quality of life. The \textit{Health Insurance Portability and Accountability Act 1996} (HIPAA)~\cite{hipaa} protects the health records of an individual subject, as well as data collected for medical research. Privacy laws have spurred research into anonymization algorithms. One such example is algorithms that remove facial information from MRI scans~\cite{bischoff2007technique,schimke2011quickshear,milchenko2013obscuring}. 

While there are laws and guidelines to control private data sharing, model sharing or using models learned from private data may also leak information. The risk to participants' privacy, even when only summary statistics are released, has been demonstrated and widely discussed in the field of genome-wide association studies~\cite{homer2008resolving,craig2011assessing}. In a similar spirit, a neural network model learned from private data can be seen as a summary statistic of the data, and private information may be extracted from it. To demonstrate the feasibility of information leakage, we study the problem of extracting information about individuals from a model trained on the `brain age prediction' regression task using neuroimaging data. Brain age is the estimate of a person's age from their brain MRI scan, and it is a commonly used task for benchmarking machine learning algorithms.

In particular, we study attacks to infer which samples or records were used to train the model. These are called \textit{Membership Inference attacks}~\cite{shokri2017,nasr2019}. An adversary may infer if an individual's data was used to train the model, thus violating privacy through these attacks.  Consider a hypothetical example, where some researchers released a neural network trained with scans of participants in a depression study. An adversary with access to the individual's scan and the model may identify if they participated in the study, revealing information about their mental health, which can have undesirable consequences.

Previous work on membership inference attacks focus on supervised \emph{classification} problems, often exploiting the model's over-confidence on the training set and the high dimensionality of the probability vector~\cite{shokri2017,salem2019ml,pyrgelis2017knock}. Our work demonstrates membership inference attacks on \emph{regression} models trained to predict a person's age from their brain MRI scan ({brain age}) under both white-box and black-box setups. We simulate attacks on the models trained under centralized as well as distributed, federated setups. We also demonstrate a strong empirical connection between overfitting and vulnerability of the model to membership inference attacks.





