\documentclass{midl} % Include author names

% The following packages will be automatically loaded:
% jmlr, amsmath, amssymb, natbib, graphicx, url, algorithm2e
% ifoddpage, relsize and probably more
% make sure they are installed with your latex distribution
\usepackage{xurl}
\usepackage[normalem]{ulem}

\usepackage{mwe} % to get dummy images
\usepackage{xcolor}
\definecolor{blue}{HTML}{1F77B4} 
\definecolor{orange}{HTML}{FF7F0E} 
\definecolor{green}{HTML}{2CA02C} 
\definecolor{red}{HTML}{D62728} 
\definecolor{purple}{HTML}{9467BD} 
%\newcommand{\mup}[1]{\textcolor{red}{#1}} %creates a red markup
\newcommand{\mup}[1]{#1} %changes textcolor to black again
\renewcommand{\sout}[1]{} %removes strikeout text

\jmlryear{2024}\jmlrworkshop{Full Paper -- MIDL 2024}\jmlrvolume{-- nnn}\editors{Accepted for publication at MIDL 2024}
%\jmlrvolume{-- Under Review}
%\jmlryear{2024}
%\jmlrworkshop{Full Paper -- MIDL 2024 submission}
%\editors{Under Review for MIDL 2024}

\title[Neural obfuscation for privacy preservation]{Implicit neural obfuscation for privacy preserving medical image sharing}

 % Use \Name{Author Name} to specify the name.
 % If the surname contains spaces, enclose the surname
 % in braces, e.g. \Name{John {Smith Jones}} similarly
 % if the name has a "von" part, e.g \Name{Jane {de Winter}}.
 % If the first letter in the forenames is a diacritic
 % enclose the diacritic in braces, e.g. \Name{{\'E}louise Smith}

 % Two authors with the same address
 % \midlauthor{\Name{Author Name1} \Email{abc@sample.edu}\and
 %  \Name{Author Name2} \Email{xyz@sample.edu}\\
 %  \addr Address}

 % Three or more authors with the same address:
 % \midlauthor{\Name{Author Name1} \Email{an1@sample.edu}\\
 %  \Name{Author Name2} \Email{an2@sample.edu}\\
 %  \Name{Author Name3} \Email{an3@sample.edu}\\
 %  \addr Address}


% Authors with different addresses:
% \midlauthor{\Name{Author Name1} \Email{abc@sample.edu}\\
% \addr Address 1
% \AND
% \Name{Author Name2} \Email{xyz@sample.edu}\\
% \addr Address 2
% }

%\footnotetext[1]{Contributed equally}

% More complicate cases, e.g. with dual affiliations and joint authorship
\midlauthor{\Name{Mattias P. Heinrich\nametag{$^{1}$}} \Email{mattias.heinrich@uni-luebeck.de}\\
\addr $^{1}$ Institute of Medical Informatics, Universit\"{a}t zu L\"{u}beck, Germany \AND
\Name{Lasse Hansen\nametag{$^{2}$}} \Email{lasse@echoscout.ai}\\
\addr $^{2}$ EchoScout GmbH, L\"{u}beck, Germany
}

\begin{document}

\maketitle

\begin{abstract}
Despite its undeniable success, deep learning for medical imaging with large public datasets leads to an often overlooked risk of leaking sensitive patient information. A person's {X-ray}, even with proper anonymisation applied, can readily serve as fingerprint and would enable a highly accurate re-identification of the same individual in a large pool of scans. Common practices for reducing privacy risks involve a synthetic deterioration of image quality, e.g. by adding noise or downsampling images, before sharing them publicly. Yet, this also adversely affects the quality of downstream image recognition models trained on such datasets. We propose a novel strategy for finding a better compromise of model quality and privacy preservation by means of implicit neural obfuscation. Our method jointly overfits a neural network to a small batch of patients' X-ray scans and applies a substantial compression - the number of network parameters representing the images is more than 6x smaller than the original \sout{pixels} \mup{images}. In addition, we introduce a k-anonymity mixing that injects partial information from other patients for each reconstruction. That way identifiable information is efficiently obfuscated, while we manage to maintain the quality of relevant image parts for the intended downstream task. Experimental validation on the public RANZCR CLiP dataset demonstrates improved segmentation quality and up to 3 times reduced privacy risks compared to a more basic image obfuscation baselines. In contrast to other recent work that learn specific anonymous representations, which no longer resemble visually meaningful scans, our approach remains interpretable and is not tied to a certain downstream network. Source code and a demo dataset are available at \url{https://github.com/mattiaspaul/neuralObfuscation}.

\end{abstract}

\begin{keywords}
neural implicit representation, anonymisation, obfuscation, image sharing 
\end{keywords}
\section{Introduction / Motivation} The trend towards larger models, in particular vision transformers,  for image recognition have exemplified the need for training with millions of images at the same time. While the advent of grand challenges in medical imaging has led to an ever increasing amount of public CTs, MRIs and X-rays - their amount is still orders of magnitudes smaller than natural image databases (e.g. LVD-142M or SA-1B). Yet, hundreds of millions of digitised scans \cite{schockel2020developments} are acquired and stored in local clinical picture archives each year. The vast majority of them is never shared (anonymously) with the research community, one likely strong reason being privacy concerns and tighter regulations \cite{mostert2016big}. Despite its benefits of restricting direct access to personal information the current process of image anonymisation or pseudonymisation is far from perfect \cite{kaissis2020secure}. \cite{packhauser2022deep} revealed a severe risk of re-identification \textit{even if rigorous anonymisation of images is performed}, which may enable an attacker to find a person with probabilities as high as 90\% within a large public dataset given another X-ray of them. In fact millions of scans together with medical reports have already been leaked due to poor IT security at some hospitals\footnote{\url{https://www.blackhat.com/eu-23/briefings/schedule/index.html#millions-of-patient-records-at-risk-the-perils-of-legacy-protocols-34188}} that could be linked to anonymised data and increase the risk of re-identification attacks even further. Our objective is hence to devise a safer mechanism that enables anonymous image data release with substantially reduced re-identification risk, but at the same time this data should retain its diagnostic value for a given intended downstream task, e.g. semantic segmentation. 
\section{Related work}
Much research has been devoted to de-identifying individuals in natural images or video sequences. Since visual re-identification risks pose a severe challenge to comply with current data privacy regulation obfuscation strategies have been devised to modify images to make persons harder to identify. The DP-Net \cite{Fan2018image} explores blurring, black/white boxes as well as adversarially learned degradations (cf. also \cite{wu2018towards}) to maintain the targeted downstream task performance while reducing privacy leakage. \cite{zhu2020deepfakes} and \cite{dall2022graph} propose to create synthetic image replacements (DeepFakes for de-identification) to preserve privacy in medical videos while preserving diagnostic features for downstream tasks, i.e. preserving keypoints. Advanced methods for video-based person re-identification have been developed in \cite{mclaughlin2016recurrent}. \cite{kim2021privacyBMVC} \mup{and} \cite{packhauser2023deep} proposed to learn certain geometric deformations that make the re-identification of \mup{brain MRI or }chest X-rays with retrieval learning much harder. Latent diffusion models \mup{are explored} in \cite{packhauser2023generation} to create replica datasets that demonstrate only moderate performance drops for training models for downstream abnormality classification, while enhancing privacy preservation. \cite{kim2021privacy} propose a Privacy-Net that jointly learns to map input MRI brain scans into an intermediate privacy-preserving representation, train a semantic parcellation U-Net and also minimises the re-identification risk. While showing excellent results for the given tasks, this procedure requires access to paired patients for each annotation (which is often not fulfilled) \sout{makes the obfuscation model a new point of vulnerability} and leaves the intermediate representations not interpretable for humans. \mup{Mixup-privacy }\cite{kim2023mixup} \mup{is another strategy aimed at avoiding full knowledge transfer between client and server. Both} can therefore be more closely associated with recent differential privacy approaches in federated learning \cite{rieke2020future} \mup{that could also be supplemented by encryption with mathematical security guarantees }\cite{kaissis2021end}. 
%Membership inference is another risk factor that even affects releasing trained models without raw training data, where e.g. the training of shadow models showed vulnerabilities for small scale image datasets [Shokri 2017]. 
%Packhaeuser (also generative), Pixelation DP-Net [Fan], Privacy-Net Kim, k-anonymity, synthetic data sharing
k-anonymity, which mixes information from several identities in a single output datapoint, can be seen as a particularly promising strategy to strike a good balance between privacy preservation, downstream task performance and interpretability of the obfuscation. \cite{meden2018k} compare several approaches for k-anonymity including k-Same-Pixel \cite{newton2005preserving} and a new proposed k-Same-Net along with basic pixelation strategies for face photos. They demonstrate good performance for learning to generate synthetic images that share attributes from multiple persons but are specific labels (age, gender, facial expression). 
\paragraph{Contribution:}
Our method advances the state-of-the-art in effective medical image obfuscation strategies with regards to the following three main points:
\begin{itemize}
\item robust generative model, by adapting recent work on neural implicit representation and compression for video sequences to the obfuscation of a subset of an X-ray collection,
\item novel strategy for k-anonymity that only moderately affects visual image quality while substantially reducing re-identification risks, and
\item alleviation of the strong requirements of prior work that are based on simultaneous availability of multiple scans per patients at each data provider 
\end{itemize}
Along with these technical contributions, we advance the field of privacy concerning medical deep learning with comprehensive experiments that include the evaluation of privacy risks along downstream task performance (semantic segmentation of catheters in X-rays) for baselines compared to our proposed model. Furthermore, we provide reproducible code for public Kaggle challenge data for others to replicate and built upon our work.
\begin{figure}[htbp]
\floatconts
  {fig:concept}
  {\caption{Concept figure of proposed implicit neural obfuscation strategy. A number of input chest X-rays serve as target for a neural reconstruction decoder that comprises learnable instance embeddings (D-dimensional vector for each data point) and convolutional weights. The reconstructions are supervised with a loss based on structural image similarity (SSIM). During inference a k-anonymity mixing is introduced that aims to obfuscate patient information by adding latent code information from other patients.}}
  {\includegraphics[width=\linewidth]{midl2024_neural_obfuscation_concept.pdf}}
\end{figure}
\section{Methods}
Our study comprises three aspects: image obfuscation, semantic X-ray segmentation and siamese network re-identification. The concept is implemented within the following scenario. Several data providers want to contribute anonymised X-ray scans along with detailed expert annotations of clinically relevant objects. Here, we use pixel level segmentations of foreign material, in particular central venous catheters (CVC), which are commonly used to detect critical malpositioning \cite{roldan2015central}.  We assume that part of the combined dataset comprises images with the same patient pseudonym that can be used to train a siamese retrieval network, which will be used to assess the re-identification risk. But crucially neither every image has to be annotated with CVC labels nor does every patient have to be present multiple times. Hence, we do not assume the possibility of jointly training an image obfuscation strategy to de-identify patients along with the segmentation task but rather require the obfuscation to work as a stand-alone step. In addition and in contrast to \cite{kim2021privacy} and \cite{packhauser2023deep}, we define the obfuscation strategy to be a white-box model that is accessible to the potential attacker, since having to keep such methods hidden to the public while sharing them across multiple clinics would pose another severe risk/challenge. 
Our main contribution lies in the development of a novel strategy for creating partially k-anonymous scans using neural implicit compression for open data sharing that preserve relevant feature to train semantic segmentation networks. Yet, the employed semantic segmentation and siamese re-identification methods are described as well for completeness.
\paragraph{Implicit neural obfuscation:} We base our work on the recent NeRV approach for neural representations for video compression \cite{chen2021nerv}. Implicit Neural Representations (INRs) are rapidly gaining attention for effective image representations that amongst others enabled performance leaps for 3D reconstruction \cite{mildenhall2021nerf}, image compression \cite{strumpler2022implicit} or alignment \cite{lin2021barf,wolterink2022implicit}. 

The key observation is that a low parameterisation of a fully-connected or convolutional network is sufficient to represent images based on an input of a positional encoding. Extending INRs to larger datasets (e.g. through amortised learning \cite{sitzmann2020implicit}) is not trivial, yet several newer approaches either employ learnable encoders \cite{kim2023generalizable} to predict a latent code embedding for each image or simply keep a dictionary of embedding vectors. \cite{chen2021nerv} implements the latter and learns a compact decoder model to restore a video sequence. They clearly demonstrate that in contrast to traditional auto encoders, which have a shared encoder for the whole dataset, NeRV improves reconstruction quality by training a new model for each subset (in their work short video clip). 
For our approach, we adopt this concept and fit a NeRV to each chunk of 64 images in our data set. We specify the decoder to start from a 64-dimensional latent vector that is mapped with a fully-connected layer into a 16-channel $3\times 3$ latent code and then upsampled with convolutions and pixel-shuffle operations to a target image size of $360\times 360$ pixels. We firstly experimented with a mean-squared error reconstruction loss (used traditionally in auto-encoders to mimic a maximum likelihood model) yet this led to unsatisfactory results. Minimising the structural dissimilarity index (maximising SSIM) \cite{woods1998automated}, however, achieves high quality reconstructions with good convergence. The concept is presented in detail in Fig.~\ref{fig:concept}.

Next, we introduce a k-anonymity mixer into the inference path of our NeRV-image reconstruction. A $N\times N$ matrix, which is the sum of an identity and Gaussian noise with a hyperparameter $\rho$ controlling the standard deviation, is multiplied with the instance embeddings. That way the latent codes share information from other patients in the same mini-batch. Because the noise is injected at the lowest level of the convolutional decoder it also affects global contextual image content and will ideally mask a substantial amount of identifiable information. This step is only performed at inference, once a subset of images has been fully fitted to avoid the risk of learning to reintroduce personal fingerprints. 
\paragraph{Catheter segmentation:}
We opt to use semantic segmentation of catheters as downstream task, due to its clinical relevance paired with challenges for obfuscated images. Central venous catheters are extremely thin foreign objects that typically form an elliptic curve that end in the vena cava. We employ a straightforward 2D SegResNet model (using the MONAI implementation) \cite{myronenko20193d}. A unit-weighted combination of soft Dice loss and binary cross entropy (after sigmoid activation) is used to train the network with pixellevel supervision. Note, that we always assume high-quality annotations are available and do not deteriorate labels as they pose a very limited risk for re-identification.  
\paragraph{Re-identification}
We implement a classic siamese re-identification network \cite{taigman2014deepface} that comprises two identical ResNet34 streams, which produce $D$-dimensional feature encodings for each image within a mini-batch of size $N$. A cosine similarity is applied to produce a $2N\times2N$ score matrix which is fed into the objective function, noise-contrastive estimation loss (InfoNCE) \cite{oord2018representation}, which aims to maximise the similarity of the only positive example out of each $2N-1$ candidates. 
\section{Experiments and Results}
The data was obtained from the Kaggle RANZCR CLiP challenge\footnote{\url{https://www.kaggle.com/c/ranzcr-clip-catheter-line-classification/data}}. We follow a similar pre-processing as \cite{hansen2021radiographic} in that we first predict lung masks to each X-ray and automatically define a suitable bounding box for each scan. The images and labels are resampled to $384\times 384$ pixels and the CVCs are dilated to approx. 5 pixels. The whole dataset comprises $>$10â000 images, but we make a subselection to datapoints that either contain a normal CVC annotation or a part of a patient that occurs at least twice to be able to evaluate the re-identification risk. This yields 1536 training and 512 test scans for CVC segmentation and 576 patients with 1152 scans - and 384 patients with 768 images respectively for training and testing for the re-identification risk evaluation (note: the sets \sout{due} \mup{do} not have to be disjoint).

For the image reconstruction/obfuscation, we leave the architectural design setup as is based on the public NeRV repository \footnote{\url{github.com/haochen-rye/NeRV}}, yet we substantially decreased the capacity of the model to avoid overfitting. In initial experiments, we aimed for approx. 500k trainable parameters per batch of 128 images, which yields a compression of $>$33-fold when assuming the same quantisation of model weights and image pixels and image dimensions of $360\times 360$ pixels. However, this resulted in an under-fitting of the reconstructed images with missing details. Hence, we opted for tripling the parameter count and storing 64 images per NeRV, which is still a considerable 6-fold compression and yields PSNRs of, on average, approx. 40dB.  Such a high agreement with the original data will obviously make the re-identification of the same person easier and hence decrease the desired privacy preservation. It is therefore crucial to adjust a suitable noise parameter $\rho\in\{0, 0.04, 0.06, 0.08\}$ for the proposed k-anonymity mixing. 

As baseline obfuscation strategy, we employ pixellation. That means a range of compressed versions of the input images are obtained by downsampling the input images by factors of $\{1, 2, 4, 6\}$ and resampling them afterwards to the original resolution. We expect both the segmentation quality and re-identification risk of models trained with these degraded images will be lowered. 

For the 2D SegResNet we chose 24 initial feature channels and 3.5 million parameters in total. The model is trained with a batch-size of 32 for 375 epochs (number of training images is 1536). We use Adam with an initial learning of 2$\cdot10^{-3}$ that is reduced by half every 1500 iterations and restarted every 4500 iterations. The first 1000 iterations are stabilised with an additional heatmap loss. We employ the \texttt{RandomPhotometricDistort} and \texttt{RandomErasing} augmentation from Torchvision (v2) and add affine geometric transformations (with a standard deviation of 7$\cdot10^{-2}$ and random horizontal flipping to both images and labels. At test time, we only include the horizontal flip, hence two predictions are averaged per input. 

For training a siamese re-identification network, we follow common practices of contrastive self-supervised learning and use an Imagenet-pretrained ResNet34 for each batch of 32 image pairs. The output feature size is fixed to 256 channels and the InfoNCE loss with cosine similarity and a temperature of 7$\cdot10^{-2}$ is used as loss. Adam was used with initial learning rate of $10^{-3}$ for 444 epochs with a single step of 0.2 at half-time. The same augmentations are used as before for the SegResNet. This time, we also include them for testing as we found otherwise all models could not cope with the large diversity of scanning parameters and/or geometric misalignment. We average 25 predictions for each pair of potential matches. Having 384 patients in the test set, we compute the risk of re-identification for a single other image of that same person in the training using any number of guesses from 1 to 15, meaning e.g. the random chance at top-5 would be about 1\%.  

For both experiments (segmentation and re-identification) we evaluated whether the models trained with obfuscation perform better (here higher re-identification means better even though this is a worse outcome for an algorithm) with original test scans or the same modulation. We found that using obfuscated images at test time works in all instances best, likely due to the fact that the models have learned to adapt to these images. \mup{Crucially, all re-identification attacks reported were also retrained with the same obfuscation strategies to adapt to the knowledge of the defence mechanism.}
All models are trained on RTX A4000 cards with bfloat16 precision and \texttt{torch.compile} - within a typical run time of 1 hour. Further implementation details can be found in our public source code at: \url{https://github.com/mattiaspaul/neuralObfuscation}.
\begin{figure}[htbp]
\floatconts
  {fig:visual1}
  {\caption{Visual result comparing both obfuscation and downstream performance. From left to right: \textcolor{blue}{$\blacksquare$} original image with ground truth segmentation; \textcolor{red}{$\blacksquare$} pixellation to $\frac{1}{6}$ resolution; \textcolor{green}{$\blacksquare$} NeRV based obfuscation with $\rho=0.04$ and; \textcolor{purple}{$\blacksquare$} $\rho=0.08$ respectively. Clearly, the neural obfuscation better balances personal de-identification and diagnostic quality. More results are found in the appendix. }}
  {\includegraphics[width=\linewidth]{example_segout_36_48.png}}
\end{figure}
\paragraph{Segmentation Results: }
Apart from the strongest pixellation variant, all approaches perform reasonably well with average Dice scores of above 75\% on the challenging downstream task. Remarkably the NeRV obfuscation with $\rho=0.08$, which introduces strong visible artefacts still produces predominantly high quality segmentations as visualised in Fig.~\ref{fig:visual1} and \ref{fig:visual2}. The highest overall performance of 86\% is reached by using the original images followed by NeRV without k-anonymity and half-resolution scans with each 83\%.

\begin{figure}
  \centering
  \begin{tabular}[b]{c}
    \includegraphics[width=.46\linewidth]{ranzr_cvc_new_nervR_imgsizes_flip64.pdf} \\
    \small (a) Segmentation quality
  \end{tabular} 
  \begin{tabular}[b]{c}
    \includegraphics[width=.46\linewidth]{ranzr_reident_pair_nerv_R_64.pdf} \\
    \small (b) Re-identification risks
  \end{tabular}
  \caption{Comparison of \mup{cumulative statistics of} segmentation quality vs. re-identification risk. NeRV obfuscation with $\rho=0.04$ is on par for CVC segmentations while posing a 50\% lower privacy risk (at top-5) as pixellation with half-resolution \mup{(indicated as 192)}.}
  \label{fig:results}
\end{figure}
\paragraph{Re-identification Evaluation: }
Putting the segmentation scores into context with privacy preservation, it becomes evident that only the strongest pixellation with poor downstream performance comes close with a top-5 risk of 47\% to the lowering of re-ID risks of the implicit neural obfuscation. When increasing  $\rho$ from $0.04$ to $0.08$ the top-5 risk decreases from 42\% to 32\%. The contrast is particular stark for top-1 re-identification with original images, $>$60\% compared to our NeRV with $\rho=0.08$ with $<$20\% - a three-fold improvement. \mup{Choosing $\rho$ depends on the intended use case and privacy-risk assessment. Since pre-trained task- and re-identification models can be quickly evaluated with different $\rho$s for a given NeRV model this provides a first indication of suitable choices. A re-training of both networks is, however, required to validate this assessment}.
\section{Discussion and Conclusion}
Our study demonstrates that privacy risks are imminent for anonymous medical image data sharing, but they could be addressed by a suitable neural obfuscation strategy with negligible performance drops of the models trained and evaluated with such data. The white-box model leads to interpretable outputs and does not impact the process of training downstream models, since normal images can be shared. It is computationally lightweight requiring on average one second to fit a NeRV per image (one minute for a batch of 64). The compression of $>6$-fold also brings benefits for a more efficient data transfer. This is the first time neural implicit obfuscation is used for interpretable X-ray segmentation and the proposed introduction of k-anonymity yielded a large improvement in risk reduction.   

\paragraph{Limitations:} There are limitations with regards to the employed comparative methods, since we restricted them to be viable in a scenario where not all labelled data has to be available with multiple scans per patient during training. In case this is possible, even stronger performance could be achieved by specifically optimising de-identification together with segmentation. We also wanted to avoid black-box obfuscation models that have to be kept secure for further anonymisation steps, e.g. at another clinical centre. This is not strictly necessary if all data comes from one hospital. \mup{Further initial experiments to extend the number of baseline comparisons to deformation based obfuscation and mixup-privacy as well as the extended evaluation on our NeRV based approach to digitally reconstructed radiographs (DRRs) and another downstream segmentation task can be found in our Github repository and Supplementary material.}

\paragraph{Future work:} It is not yet clear, how such an approach could be extended to sharing volumetric scans. 3D CTs and MRI comprise substantially more anatomical detail and could thus lead to even greater privacy risks. There are also other promising strategies to learn implicit image embeddings, e.g. using generalisable INRs \cite{kim2023generalizable}. While being more complex to train, using meta-learning, they could decouple larger parts of the neural networks between shared and instance based parameters and hence provide more control over the level of compression and obfuscation. It also remains to be seen, whether a dataset with NeRV-based k-anonymity still excels at other tasks of chest pathology detection.
% Acknowledgments---Will not appear in anonymized version
%\midlacknowledgments{We thank a bunch of people.}


\bibliography{midl2024}

\newpage

\appendix

\begin{figure}
  \centering
  \includegraphics[width=\linewidth]{example_segout_35_48.png}
  \includegraphics[width=\linewidth]{example_segout_32_48.png}
  \includegraphics[width=\linewidth]{example_segout_37_48.png}
  \includegraphics[width=\linewidth]{example_segout_39_48.png}
  \includegraphics[width=\linewidth]{example_segout_43_48.png}
  \caption{Additional results comparing both obfuscation and downstream performance. From left to right: \textcolor{blue}{$\blacksquare$} original image with ground truth segmentation; \textcolor{red}{$\blacksquare$} pixellation to $\frac{1}{6}$ resolution; \textcolor{green}{$\blacksquare$} NeRV based obfuscation with $\rho=0.04$ and; \textcolor{purple}{$\blacksquare$} $\rho=0.08$ respectively.}
  \label{fig:visual2}
\end{figure}

\mup{There are certain restrictions (see paragraph Limitation in main paper) that have limited our baseline comparisons for privacy-preserving data sharing experiments: the work of Packhäuser et al. \cite{packhauser2023deep} e.g. assumes access to an already trained task model to perform obfuscation and some aspects of Kim et al. \cite{kim2023mixup} are set within a scenario where not all data from multiple providers is publicly shared. We designed two additional new experiments inspired by this state-of-the-art with certain adaptations to our setting 1) deformation-based obfuscation and 2) mix-up privacy; details of which can be found in the revised appendix. For 1) we create smooth, invertible local deformations to obscure the identity and do indeed see a 20\% drop in top5 re-identification risk. However, when re-training the attack model with knowledge about deformations (online augmentations) the risk increases again by 10\% making it 30\% less safe than NeRV with $\rho$=0.06. For 2) the scenario of sharing mix-up versions of images and labels without requiring a dual client-server training setup is more challenging (but also possible) and leads to a great reduction of privacy risks (by a factor of 2 or 4 in our tests): we could, however, not avoid a substantial drop in segmentation accuracy to about 40\% Dice score for 4-fold mix-up (lower than the strongest pixelation strategy). We attribute this to the fact that jointly training on multiple images with similarly looking thin foreign objects (catheters, tubes, electrodes) is less stable than brain segmentation. Future work could strengthen a combination of these orthogonal strategies. 
In addition, we included two additional proof-of-concept experiments on different datasets in the public repository (\url{https://github.com/mattiaspaul/neuralObfuscation}). They demonstrate the transfer of our method with same hyper-parameters onto a slightly different modality and a new downstream segmentation task. 1) We created DRRs (digitally reconstructed radiographs) of a public paired thorax CT dataset and evaluated the gains in re-identification risk (top5) reduction from 72.92\% to 53.12\% using NeRV (with $\rho$=0.08) 2) Due to the absence of fine-grained structures in large-scale X-ray databases (e.g. \url{https://github.com/ngaggion/CheXmask-Database} only provides masks for lungs and heart) we evaluated the possibility of segmenting the clavicles in the Montgomery County CXR dataset \cite{brioso2023semi}. This also led to satisfactory quantitative results of 81\% Dice for qualitatively strongly obfuscated images. The repository also contains code for data preparation to further extend the experiments once more comprehensive data becomes available.}

\end{document}