% REMOVED FOR MIDL
% \begin{figure}[ht]
%     \centering
%     \includegraphics[width=0.81\linewidth]{assets/teaser_midl.png}
%     \caption{Qualitative results for the aligner downstream task (\sectionref{sec:downstream}). The image shows the effect of balancing the training set using synthetic data. The model trained only with real data performs well with the fairly represented class, but struggles with the under-represented class. In contrast, when the training set is balanced, the performance is good across all classes.}
%     \label{fig:teaser}
% \end{figure}

\section{Introduction}
\label{sec:intro}


%What is the studied problem and why it is important?
In the medical domain, data plays a crucial role in diagnosis, treatment planning, and research. 
In recent years, Machine Learning (ML) models trained on such data have demonstrated strong potential in supporting clinical decisions, predicting disease outcomes, personalizing treatment plans or assisting in medical imaging interpretation~\cite{javaid_significance_2022}.
% Being often highly sensitive, the handling of such data presents unique challenges, and unauthorized access or disclosure can lead to serious privacy violations, with potential legal and ethical implications. Therefore, stringent measures are required to ensure the confidentiality, integrity, and availability of this data. 
% Despite these challenges, the use of medical data is essential for advancing healthcare, and machine learning techniques are becoming extremely relevant in this domain. For instance, \textcolor{red}{Machine Learning (ML)} models can be trained on medical data to predict disease outcomes, personalize treatment plans, or assist in medical imaging interpretation~\cite{javaid_significance_2022}. 
However, these models often require large volumes of data, which can be difficult to obtain in practice. Medical data sharing is constrained by strict regulations, which vary across different jurisdictions. Notable examples include the EU General Data Protection Regulation (GDPR), the US Health Insurance Portability and Accountability Act (HIPAA) and China's Personal Information Protection Law (PIPL). Moreover, the creation of large medical datasets is also inhibited by the high cost of annotations, which require expert clinicians and significant time investment. To address these challenges, deep generative models can be used to generate synthetic medical data~\cite{han_breaking_2020, kazerouni_diffusion_2023}.
% , which can be used to train ML models in clinically relevant tasks.
% One solution to this problem is the use of synthetic data, which can be generated from real data and used for training models~\cite{han_breaking_2020, kazerouni_diffusion_2023}.
In addition to data scarcity, another issue with using ML models to address clinical tasks lies in the unbalanced nature of medical datasets.
The presence of unbalanced data distributions can introduce bias and limit the generalizability of the trained models. However, balanced data are often not available because the number of data samples for a specific event, such as a disease or a particular medical examination, depends on how frequently that event occurs, which can vary significantly.
Similarly, medical datasets rarely cover uniformly relevant clinical features such as age, sex, ethnicity and disease stage. 
Generative models, when trained with conditioning, could be used to reduce this problem by synthesizing additional data for under-represented classes~\cite{Li_Iterative_MICCAI2024}.
% balancing datasets and thus making it easier to train neural networks for clinical downstream tasks. 


%What are the limitations of existing solutions?
Different generative techniques have been developed to advance image generation, including Variational Autoencoders (VAE)~\cite{kingma2022autoencodingvariationalbayes}, Generative Adversarial Networks (GANs)~\cite{10.1145/gan}, and Denoising Diffusion Probabilistic Models (DDPMs)~\cite{ho_denoising_2020}. VAEs offer an explicit probabilistic formulation but are known to produce blurry images; GANs rely on adversarial training to produce high quality samples, but suffer from mode collapse~\cite{che2017moderegularizedgenerativeadversarial}; DDPMs learn to reverse a diffusion process to generate images and, when trained in the latent space of an Autoencoder~\cite{rombach_high-resolution_2022}, enable high-resolution image synthesis.
Recently, \citet{lipman2023flowmatchinggenerativemodeling} introduced Flow Matching (FM), a simulation-free approach for training Continuous Normalizing Flows. FM models, when optimized with Optimal Transport learning objective (OTFM), have demonstrated improved sample quality compared to diffusion-based methods in the domain of 2D natural images~\cite{lipman2023flowmatchinggenerativemodeling, pmlr-v235-esser24a}.
However, whether the improvements observed with OTFM in 2D natural images translate to 3D medical image generation remains an open question, as most recent works in this domain rely on diffusion-based approaches, which currently dominate the field~\cite{pinaya_brain_2022, khader_denoising_2023, friedrich_wdm_2024, wang20243dmeddiffusion3dmedical}.

% There exist different generative techniques such as Autoregressive models, Variational Autoencoders and Generative Adversarial \textcolor{red}{Networks} (GANs). Among the most recent approaches in this field are Latent Diffusion Models (LDMs)~\cite{rombach_high-resolution_2022}, which aim at generating novel samples using Denoising Diffusion Probabilistic Models (DDPMs)~\cite{ho_denoising_2020} trained in a latent space built through an autoencoder. Still, the literature pertaining image generation is rapidly evolving, and newer techniques have been proposed in the field of 2D natural images generation. In particular, Flow Matching (FM), a new paradigm for generative modelling, when optimized with Optimal Transport learning objective (OTFM), provides better generation quality~\cite{lipman2023flowmatchinggenerativemodeling, pmlr-v235-esser24a}. 
% However, most studies in the field of 3D medical data generation employ GANs or DDPMs~\cite{SUBRAMANIAM2022102396, kim_10452780, pinaya_brain_2022, khader_denoising_2023, friedrich_wdm_2024, wang20243dmeddiffusion3dmedical}.


In this work, we investigate the use of Flow Matching in the 3D medical imaging domain, more specifically in the context of 3D craniofacial skeletal data generation. 
% REMOVED FOR MIDL
% This anatomical region encompasses complex structures such as nasal cavities, dental arches, orbits and temporomandibular joints, which require the generation to be very detailed in order to be considered anatomically plausible. 
We use our trained model to generate synthetic samples and show via quantitative and qualitative analysis how they effectively capture the main anatomical structures.
We compare synthetic datasets generated using OTFM and DDPM, finding that OTFM surpasses DDPM in generating more realistic 3D data. Moreover, we show how OTFM leads to a more robust generation, i.e. a lesser number of samples with unplausible anatomical structures.
Finally, we test our best synthetic dataset in two clinical downstream tasks, namely skull alignment and shape completion, assessing its utility in both augmenting and substituting real data. We also demonstrate that synthetic data can be used to balance datasets to improve model performance.
% why are the findings important, what does it mean going forward?


% Aggiungere limiti e difetti di queste soluzioni? O diciamo semplicemente che intendiamo investigare l'utilizzo di OT-FM nel contesto di immagini 3d mediche, più specificamente per generare 3d craniofacial skeletal data. L'anatomia di questa regione è piuttosto complessa, e comprende strutture anatomiche, come i denti o il condilo, che richiedono un grande livello di dettaglio.


% Oltre alla scarsa disponibilità di dati, un altro problema che affligge i dati medici è il fatto che i dataset sono spesso sbilanciati. Questo è dovuto principalmente a cause cliniche, in quanto la frequenza delle malattie è estremamente variabile e spesso non è possibile raccogliere la stessa quantità di dati per le diverse categorie prese in considerazione. I modelli generativi potrebbero essere quindi utilizzati per ridurre questo problema e rendere quindi più facile l'addestramento di modelli per downstream tasks clinici. 

% Per concludere, i nostri contributi si possono riassumere con:
% Investighiamo l'utilizzo di Flow Matching nell'ambito della generazione di 3D craniofacial structures, mostrandone l'effifacia nel generare strutture realistiche
% We compare synthetic datasets generated using both OT-FM and DDPM, showing that OT-FM surpasses DDPM in generating more realistic and robust images.
% Testiamo l'utilità dei dati generati in due downstream tasks clinici, shape completion e skull alignment, dimostrando la loro efficacia sia nell'aumentare dataset esistenti, che nel sotituirli in toto quando questi non sono disponibili.


% Rimuovere parte di memoria (e architettura dall'intro

%Moreover, 3D images require a large amount of computational resources to be processed, which often leads to the downsampling of input images causing a significant decrease in the quality of the generation. The result is the creation of datasets that have a much lower resolution than the ones used in clinical contexts.
%What is the proposed solution, its advantages
%In this work we use OT-FM to generate synthetic 3D medical images and we compare this method to the results of the generation using DDPMs. Moreover, we improve the architecture of the autoencoder to enable it to be optimized with images at the highest available resolution. We test the results both qualitatively and quantitatively, and we evaluate the performance of the generated datasets in a medical downstream task.
%How does the proposed solution work
%What are the findings, why they are important, what does it mean going forward
%We show that OT-FM can be successfully used in the context of 3D medical images and that models trained with OT-FM generate higher quality images with respect to DDPMs. The better generation is accompanied by greater robustness, i.e. OT-FM models never fail in generating meaningful data while DDPMs sometimes do. The results on the downstream task confirm the quality of the images and prove that the generation quality is high enough for the synthetic data to be used to augment or substitute real data.
