\subsection{Membership Inference Attacks on Centralized Training}\label{subsec:centralized_result}
\begin{figure}[htbp]
    \centering
    \subfigure[Prediction errors]
    {%
    \label{fig:error_hist} \includegraphics[width=0.4\textwidth]{centralized_histogram/error_hist.pdf}
    }%
    \subfigure[Gradients of \texttt{conv 1} layer]
    {%
    \label{fig:grad_hist}
    \includegraphics[width=.4\textwidth]{centralized_histogram/grad_hist.pdf}
    }
    \caption{Distribution of prediction error and gradient magnitudes from the trained models.}
    \label{fig:histogram}
\end{figure}
\begin{table}[htbp]
    \centering
    \begin{tabular}{lcc}
        \toprule
        {Features} & \texttt{3D-CNN}  & \texttt{2D-slice-mean}\\
        \cmidrule(lr){1-1} \cmidrule(lr){2-2} \cmidrule(lr){3-3}
        activation                & $56.63$           & -  \\
        error                     & $59.90 \pm 0.01$  & $74.06\pm 0.00$ \\
        gradient magnitude        & $72.60 \pm 0.45$  & $78.34\pm 0.17$ \\
        gradient (\texttt{conv 1} layer)    & $71.01 \pm 0.64$  & $80.52\pm 0.40$ \\
        gradient (output layer)    & $76.65\pm 0.44$   & $82.16\pm 0.29$ \\
        gradient (\texttt{conv 6} layer)    & $76.96\pm 0.57$   & $82.89\pm 0.83$ \\
        prediction + label        & $76.45\pm 0.20$   & $81.70\pm 0.29$ \\
        prediction + label + gradient (\texttt{conv 6 + output})& $78.05\pm 0.47$ & $ 83.04\pm 0.50$ \\
        \bottomrule
    \end{tabular}
    \caption{Membership inference attack accuracies on centrally trained models (averaged over 5 attacks). Details about \texttt{conv 1}, \texttt{output} and \texttt{conv 6} layers are provided in \appendixref{subsec:architecture_diagrams}.}
    \label{tab:centralized_training}
\end{table}
\tableref{tab:centralized_training} summarizes the results of simulating membership attacks with various features. As apparent from \figureref{fig:error_hist}, test and train samples have different error distributions due to the inherent tendency of deep neural networks to overfit on the training set~\cite{zhang2017rethinking}.
Consequently, the error is a useful feature for membership inference attacks. Error is the difference between prediction and label, and using prediction and label as two separate features produced even stronger attacks, as indicated by higher membership attack accuracies.
One of the reasons for this could be that the model overfits more for some age groups. Using true age information (label) would enable the attack model to find these age groups, resulting in higher attack accuracy.



Attacks made using error or prediction, and label are black-box attacks. A white-box attacker may also utilize more information about the models' internal workings like the gradients, knowledge about loss function, training algorithm, etc. \rebuttal{Deep learning models are commonly trained until convergence using some variant of gradient descent. The convergence is achieved when the gradient of loss w.r.t parameters on the training set is close to 0. As a result, gradient magnitudes are higher or similar for unseen samples than training samples (see \figureref{fig:grad_hist}).} Therefore, we used the gradient magnitude of each layer as a feature, resulting in attack accuracy of 72.6 and 78.34 for \texttt{3D-CNN} and \texttt{2D-slice-mean} models, respectively. Finally, we simulated attacks using gradients of parameters at different layers\footnote{We consider layers close to the input or output layers as these have fewer parameters, and attack models are easily trained. Intermediate layers had more parameters, making it hard to learn the attack model.}. We find that parameter-gradients of layers closer to the output layer (i.e., \texttt{conv 6, output} layers) are more effective compared to the gradients of layers closer to the input (\texttt{conv 1}). Preliminary results hinted that activations do not provide much information to attack the models. So, we did not simulate attacks on the \texttt{2D-slice-mean} models with activations as features. The best attack accuracies of 78.05 and 83.04 for attacking \texttt{3D-CNN} and \texttt{2D-slice-mean} model were achieved by using prediction, labels, and gradients of parameters close to the output layer. \rebuttal{Successful membership inference attacks demonstrated in this section accessed samples from the training set, which is limiting. In \appendixref{sec:shadow_training}, we discuss attacks accessing only the training set distribution and not the training samples.}