\section{Introduction}

Deep learning methods can be beneficial for medical applications, but often suffer from limited data availability~\cite{bowles_gan_2018}. Generating and sharing synthetic datasets was suggested as a viable solution~\cite{dube_approach_2014}. 

Many approaches can synthesize medical images~\cite{bowles_gan_2018, yi_generative_2019}, fewer jointly produce segmentation maps~\cite{guibas_synthetic_2018, greenspan_medgen3d_2023} (which would be required for training downstream segmentation models) and, to the best of our knowledge, not a single method exists that could be applied to a new task without manually adjusting hyperparameters of training or data preprocessing.

The benefits of a hyperparameter-free adaptive method are self-evident. Such a method was realized for segmentation tasks by nnU-Net~\cite{isensee_automated_2021}, a method of automatically adjusting architecture and hyperparameters of a U-Net~\cite{ronneberger_u-net_2015} that showed excellent and robust performance in tens of challenges~\cite{isensee_automated_2021}.

Automatic adaptation of a high-quality underlying model and training pipeline is a general idea that could be extended to other tasks, such as medical image synthesis. A Generative Adversarial Network (GAN) called StyleGAN2~\cite{karras_analyzing_2020} demonstrated good results in medical image synthesis~\cite{woodland_evaluating_2022}. However, it currently does not automatically adapt to the image dimensions or the dataset size.

In this paper, we introduce an automatically adjustable StyleGAN2 setup and integrate it with nnU-Net to create a \textbf{Hy}perparameter-\textbf{Free} medical image \textbf{S}ynthesis, \textbf{S}haring, and \textbf{S}egmentation method called \textbf{HyFree-S3}. We construct it as a distributed learning method where each site (e.g., a hospital) can automatically and asynchronously create a synthetic dataset and share it. A segmentation model can be automatically trained on the merged synthetic data and distributed back to the sites to be further automatically fine-tuned for improved performance on local data (see \figureref{fig:method}).

This approach to distributed learning  has practical advantages of requiring minimal coordination between sites and of reduced privacy risk thanks to not sharing real data, models trained with it, or their gradients (which is a potential source of data leakage in federated learning~\cite{zhu_deep_2019}). An important concern is whether synthetic data includes memorized real data, which we address with a quantitative and qualitative investigation, as well as a technique for ensuring that synthetic data is not too similar to the real data.

We evaluate our method in three segmentation settings (pelvic MRIs, lung X-rays, polyp photos) to test its generality, the impact of synthetic data sharing, and the difference in performance compared to the realistic baseline of using only local data and the strong baseline of having central access to all the real data.

In this paper, we only consider 2D models: compared to 3D models, they have lower computational requirements and need less data to be trained (which is important for the data sharing setting where some sites could have small datasets). In the future, our approach could be extended to 3D models for improved segmentation performance in settings where there is enough data and computational resources.

The contributions of this paper are as follows:
\vspace{-4pt}
\begin{itemize}
    \item We propose HyFree-S3, a hyperparameter-free distributed learning method integrating image synthesis, data sharing, and segmentation.
    \vspace{-6pt}
    \item Towards that goal, we introduce a hyperparameter-free StyleGAN2 setup that can adapt to various image dimensions and dataset sizes.
    \vspace{-6pt}
    \item The segmentation quality of HyFree-S3 is evaluated in three settings. Furthermore, its scaling behavior and ability to avoid data memorization is investigated.
\end{itemize}
\vspace{-12pt}