\documentclass{midl} % Include author names

% The following packages will be automatically loaded:
% jmlr, amsmath, amssymb, natbib, graphicx, url, algorithm2e
% ifoddpage, relsize and probably more
% make sure they are installed with your latex distribution

\usepackage{mwe} % to get dummy images
\jmlrvolume{-- Under Review}
\jmlryear{2024}
\jmlrworkshop{Full Paper -- MIDL 2024 submission}
\editors{Under Review for MIDL 2024}
\usepackage{longtable}
\usepackage{booktabs}
\usepackage{multirow}
\usepackage{hyperref}
\title[Re-DiffiNet for tumor segmentation]{Re-DiffiNet: Modeling discrepancies in
tumor segmentation using diffusion models}

 % Use \Name{Author Name} to specify the name.
 % If the surname contains spaces, enclose the surname
 % in braces, e.g. \Name{John {Smith Jones}} similarly
 % if the name has a "von" part, e.g \Name{Jane {de Winter}}.
 % If the first letter in the forenames is a diacritic
 % enclose the diacritic in braces, e.g. \Name{{\'E}louise Smith}

 % Two authors with the same address
 % \midlauthor{\Name{Author Name1} \Email{abc@sample.edu}\and
 %  \Name{Author Name2} \Email{xyz@sample.edu}\\
 %  \addr Address}

 % Three or more authors with the same address:
 % \midlauthor{\Name{Author Name1} \Email{an1@sample.edu}\\
 %  \Name{Author Name2} \Email{an2@sample.edu}\\
 %  \Name{Author Name3} \Email{an3@sample.edu}\\
 %  \addr Address}


% Authors with different addresses:
% \midlauthor{\Name{Author Name1} \Email{abc@sample.edu}\\
% \addr Address 1
% \AND
% \Name{Author Name2} \Email{xyz@sample.edu}\\
% \addr Address 2
% }

%\footnotetext[1]{Contributed equally}

% More complicate cases, e.g. with dual affiliations and joint authorship
\midlauthor{\Name{Tianyi Ren\midljointauthortext{Contributed equally}\nametag{$^{1}$}} \Email{tr1@uw.edu}\\
\Name{Abhishek Sharma\midlotherjointauthor\nametag{$^{1}$}} \Email{as711@uw.edu}\\
\Name{Juampablo Heras Rivera\nametag{$^{1}$}} \Email{jehr@uw.edu}\\
\Name{Harshitha Rebala \nametag{$^{2}$}} \Email{lhrebala@uw.edu}\\
\Name{Ethan Honey\nametag{$^{1}$}} \Email{ehoney22@uw.edu}\\
\Name{Agamdeep Chopra\nametag{$^{1}$}} \Email{achopra4@uw.edu }\\
\Name{Jacob Ruzevick\nametag{$^{3}$}} \Email{ruzevick@neurosurgery.washington.edu}\\
\Name{Mehmet Kurt\nametag{$^{1}$}} \Email{mkurt@uw.edu}\\
\addr $^{1}$ Department of Mechanical Engineering, University of Washington \\
\addr $^{2}$ Paul G. Allen School of Computer Science, University of Washington \\
\addr $^{3}$ Department of Neurological Surgery, University of Washington \\
}

\begin{document}

\maketitle

\begin{abstract}
 Identification of tumor margins is essential for surgical decision-making for glioblastoma patients and provides reliable assistance for neurosurgeons. Despite improvements in deep learning architectures for tumor segmentation over the years, creating a fully autonomous system suitable for clinical floors remains a formidable challenge because the model predictions have not yet reached the desired level of accuracy and generalizability for clinical applications. Generative modeling techniques have seen significant improvements in recent times. Specifically, Generative Adversarial Networks (GANs) and Denoising diffusion probabilistic models (DDPMs) have been used to generate higher-quality images with fewer artifacts and finer attributes. In this work, we introduce a framework called Re-Diffinet for modeling the discrepancy between the outputs of a segmentation model like U-Net and the ground truth, using DDPMs. By explicitly modeling the discrepancy, the results show an average improvement of 0.55\% in the Dice score and 16.28\% in 95\% Hausdorff Distance from cross-validation over 5-folds, compared to the state-of-the-art U-Net segmentation model. The code is available: 
\url{https://github.com/KurtLabUW/Re-DiffiNet.git}.

\end{abstract}

\begin{keywords}
Tumor segmentation, DDPMs, MRI, Deep learning
\end{keywords}

\section{Introduction}

Glioblastoma is the most frequent primary malignant brain tumor in adults, representing approximately 57\% of all gliomas and 48\% of all primary malignant central nervous system (CNS) tumors  \cite{ostrom2018cbtrus,tan2020management}. This heterogeneous group of tumors is characterized by their resemblance to glia that perform a variety of important functions including support to neurons  \cite{isensee2021nnu,9077067}.

The treatment for glioma patients generally consists of surgery, radiotherapy, and chemotherapy and the outcomes of patients with gliomas vary widely according to the glioma type and prognostic factors. Due to the superior soft tissue contrast,  multimodal MRI images which allow the complexity and the heterogeneity of the tumor lesion to be better visualized than a CT scan have become the golden standard for surgical decision-making for glioma patients \cite{hanif2017glioblastoma,keunen2014multimodal,van2019perfusion}. However, visual identification of tumor margins in CT or MRI still remains a challenge for neurosurgeons and researchers \cite{wang2019advance}. Clinically, brain tumor masks are often obtained through Magnetic Resonance Imaging (MRI) scans, which require experienced radiologists to manually segment tumor sub-regions  \cite{baid2021rsnaasnrmiccai}. This is a long process that is unscalable to the needs of all patients. Thus, the recent growth of machine learning technologies holds promise to provide a reliable and automated solution to segmentation to save time and help medical professionals with this process \cite{Luu2022}.

Deep learning techniques have been widely used in brain tumor segmentation. U-Net is the state of art for tumor segmentation. U-Net and its variants have been used in brain tumor segmentation. such as U-Net++  \cite{UNetPP}, 3D U-Net  \cite{3DUNet}, V-Net  \cite{VNet}, Attention-U-Net  \cite{AttUNet}, Trans-U-Net \cite{chen2021transunet}, and Swin-U-Net \cite{cao2022swin}. Transformer architectures has also been applied in brain tumor segmentation. TransU-Net and Swin-U-Net show potential to predict accurate tumor margins. However, the state-of-the-art models in brain tumor segmentation are still based on the encoder-decoder architectures such as U-Net  \cite{isensee2021nnu} and its variations. For instance, Luu et. al  \cite{Luu2022} modified the  nnU-Net model by adding an axial attention in the decoder. Futrega et. al  \cite{futrega2021optimized} optimized the U-Net model by adding foreground voxels to the input data, increasing the encoder depth and convolutional filters. Siddiquee et. al  \cite{siddiquee2021redundancy} applied adaptive ensembling to minimize redundancy under perturbations.

% Uet problems 
While U-Net-based architecture have led to significant improvements in region-based metrics for tumor segmentation e.g. Dice scores, it is also important to improve boundary-distance metrics like HD scores  \cite{karimi2019reducing, yeghiazaryan2018family}. Being able to locate boundaries of tumors is crucial for surgical planning. Thus, modeling techniques that are able to capture finer details and high frequency information at the boundaries, are desirable. One of the critical factors that makes predicting tumor boundaries difficult, is the inherent variability in tumor attributes at the boundaries. Thus, the modeling techniques also need to be able to capture the variability in tumor shapes.

% Diffusion model
Generative modeling techniques have seen great improvements in recent times. Specifically, Generative Adversarial Networks and Denoising-Diffusion based models have been used to generate desired images of greater quality.
While GANs are able to generate images of high fidelity, they are also prone to mode collapse. Thus, they often fail to capture the variability of the data they seek to model. On the other hand, Denoising-Diffusion based models have been shown to be good at both mode coverage i.e. capturing the variability in the data \cite{VDM}, as well as at generating high quality images \cite{DMBGOIS}. 
There have been very few instances of Diffusion models being used for brain tumor segmentation, that have shown promising results.  \cite{xing2023diff,wolleb2022diffusion,wu2022medsegdiff}.

In this work, we introduce a framework called Re-Diffinet, for modeling discrepancy between the outputs of a segmentation model like U-Net and the ground truth, using Denoising Diffusion Probabilistic Models. By explicitly modeling the discrepancy, we intend to build upon previous segmentation models, force diffusion models to focus explicitly on the regions that other models miss, and exploit diffusion models’ ability to capture finer details and variability in the data.





% \begin{figure}[t]
%  % Caption and label go in the first argument and the figure contents
%  % go in the second argument
% \floatconts
%   {fig:fig1}
%   {\caption{Model predictions of segmentation results of three segmented labels produced by Re-DiffiNet on BraTS2023 dataset. First row shows the four input contrasts, the Second row shows our model predictions, and the third row shows the ground truth labels.  }}
%   {\includegraphics[width=0.5\linewidth]{Figure1.PNG}}
% \end{figure}





\section{Methods}
\subsection{Model Architectures} \label{2}
In summary, We first trained a state-of-the-art U-Net model to predict three labels of tumor, then we tested several variations of Diffusion architectures to predict the discrepancy between the ground truth and the previous U-Net labels.  
\subsubsection{Baseline U-Net } \label{2.1}
We adopted the optimized U-Net  \cite{futrega2021optimized} as our baseline model architecture  \cite{ren2024optimization} for comparison purposes. U-Net has a symmetric U-shape that characterizes architecture and can be divided into two parts, i.e., encoder and decoder. The encoder comprises 5 levels of same-resolution convolutional layers with strided convolution downsampling.  The decoder follows the same structure with transpose convolution upsampling and convolution operating on concatenated skip features from the encoder branch at the same level. The training dataset is comprised of the pairs $\{ (I, x_0) \}_{i=1}^{N}$, where $I \in \mathbb{R}^{4 \times D \times W \times H}$ represents the four 3D-MRI contrast as multi-channel input, $x_0 \in \mathbb{R}^{3 \times D \times W \times H}$ represents the associated one-hot encoded segmentation mask, with 3 tumor labels: 1) Whole Tumor, 2) Enhancing Tumor, and 3) Necrotic Tumor Core. The baseline U-Net predicts the tumor labels $\hat{x}_0$ given the input $I$:
\begin{equation}\label{eq:eq1}
\hat{x}_0 = U(I)
\end{equation}

%Leaky ReLU (lReLU) with a slope of 0.01 and batch normalization were applied after every convolution operation.

\subsubsection{U-Net augmented Diffusion (UA-Diffusion)} \label{2.2}

This model builds upon the Diff-U-Net\cite{xing2023diff}, which uses conditional DDPMs. DDPMs work by learning to denoise images at various noise levels. Once DDPMs have been trained, they can take a randomly drawn noise (usually from a gaussian) and successively denoise it over several steps to generate a sample from the distribution of images \cite{ddpm}. Diff-UNet conditions DDPM on MRIs as shown in equation \ref{eq:eq_diffunet} \cite{xing2023diff}.

\begin{equation}\label{eq:eq_diffunet}
\hat{x}_0 = DU(\text{cat}(I,x_t), t, \hat{I}_f)
\end{equation}

In comparison, we condition our diffusion model with predictions ($U(I)$) from baseline U-Net along with MRIs ($I$) as shown in figure \ref{fig:fig2}. We tested 3 variants of this approach (3 different inputs): 1) Conditioning the diffusion model with only the U-Net output $U(I)$, 2) Conditioning the diffusion model with a concatenation of MRI contrasts and baseline U-Net predictions $U(I)$, 3) conditioning the diffusion model with MRIs $I$ masked by U-Net predictions $U(I)$. A mask which has a value 1 for each tumor voxel and 0.2 for non-tumor voxel is applied to each of the 4 MRI contrasts, which are concatenated and used as inputs. The resulting masked input is represented as shown in equation \ref{eq:eq4} :
\begin{equation}
\label{eq:eq4}
\begin{aligned}
M'(x, y, z) &= 
\begin{cases} 
1 & \text{if } U(I)[x, y, z] > 0, \\
0.2 & \text{if } U(I)[x, y, z] = 0, \text{where x,y,x are voxel indices}
\end{cases} \\
mask(I,U(I)) &= concat(I_i \circ M'|i = 1, 2, 3, 4),  \text{where i denotes an MRI contrast}
\end{aligned}
\end{equation}


  Among the 3 variants, we chose the best performing UA-Diffusion approach (Table \ref{tab:results1}) and used it for the remaining experiments. The expression for the best performing variant of UA-Diffusion is shown in equation \ref{eq:eq2}:

\begin{equation}\label{eq:eq2}
\hat{x}_0 = DU(\text{cat}(U(I),I,x_t), t, \hat{I}_f)
\end{equation}
where t is the time embedding, $x_t$ is the corresponding noise masks, $\hat{I}_f = \xi(\text{cat}(U(I),I))$ are the multi-scale features extracted using a trainable copy ($\xi$) of the encoder of the denoising-U-Net ($DU$). These multi-scale features are added to the outputs of the corresponding layers in the denoising-U-Net, $DU$ (see Figure \ref{fig:fig2}). 


\subsubsection{Re-DiffiNet} \label{2.3}
Our proposed Re-DiffiNet model architecture is similar to the U-Net augmented Diffusion (UA-Diffusion). However, instead of trying to generate ground truth segmentation masks ${x}_0$, we generate the absolute discrepancy between the ground truth segmentation masks and baseline U-Net's predictions i.e. $\Delta{x}_0 =  abs(U(I)- x_0)$ (Figure \ref{fig:fig2}). These discrepancy masks ($\Delta{x}_0$) will have a value of $1$ for each voxel that is predicted incorrectly by the baseline U-Net ($U$), and 0 for voxels where the predictions are correct.
Once, we have generated the estimated discrepancies $\Delta{\hat{x}}_0$, the tumor mask predictions can be obtained as shown in equation \ref{eq:eq3} :
\begin{equation}\label{eq:eq3}
\Delta{\hat{x}}_0 = DU(\text{cat}(U(I),I,x_t), t, \hat{I}_f) \\
\Rightarrow
{\hat{x}}_0 =  abs(U(I) - \Delta{\hat{x}}_0) \\
\end{equation}
Subtracting the estimated discrepancy $\Delta{\hat{x}}_0$ from $U(I)$ and taking the absolute, we flip every incorrect voxel (as per our estimate) in $U(I)$ i.e. $1 \rightarrow 0$ and $0 \rightarrow 1$. While, the correct voxels (as per our estimation) in the baseline U-Net, remain the same.

\textbf{Discrepancy U-Net:} To test if improvements observed are due to the combination of discrepancy modeling and diffusion model, we also investigated using a second U-Net (Discrepancy U-Net) to predict discrepancies, to correct outputs of the baseline optimized U-Net. We compare the results of discrepancy UNet with ReDiffiNet in Table \ref{tab:results1} and \ref{tab:results2}.  




\subsection{Training details}
Our models were implemented in Pytorch and MONAI, and trained on 2 NVIDIA A40 GPUs.  
The model was trained on overlapping regions, whole tumor (WT), tumor core (TC), and enhancing tumor(ET). TC entails the ET, as well as the necrotic (NCR) parts of the tumor, and WT describes the complete extent of the disease. The diffusion models was trained using a compound loss function including DICE loss, Binary cross entropy (BCE) loss, and Mean square error(MSE) loss. The model was trained using the AdamW optimizer with a learning rate of 0.0001 and a weight decay equal to 0.0001. The network’s performance was evaluated using 5-fold cross-validation. The data were randomly shuffled and equally split into 5 groups for cross-validation.  The model will be evaluated on two metrics: Dice similarity coefficient (Dice) measures the similarity between the model prediction and the ground truth; 95\% Hausdorff distance (HD95) measures the boundary distance between the model prediction and the ground truth.

%\subsubsection{Dataset}
\subsection{Dataset}
The training dataset provided for the BraTS 2023 Adult Glioma challenge \cite{baid2021rsna} consists of 1251 brain MRI scans along with segmentation annotations of tumorous regions. The 3D volumes were skull-stripped and resampled to 1 $mm^3$ isotropic resolution, with dimensions of (240, 240, 155) voxels. For each example, four modalities were given: native (T1), post-contrast T1-weighted (T1Gd), T2-weighted (T2), and T2 Fluid Attenuated Inversion Recovery (T2-FLAIR). Segmentation labels were annotated manually by one to four experts. Annotations consist of three disjoint classes: enhancing tumor (ET), peritumoral edematous tissue (ED), and necrotic tumor core (NCR). For all the MRI contrasts in the BraTS2023 dataset, we rescale the voxel intensity after Z-Score normalization as the preprocessing protocol.

%To get the ground truth labels for these datasets, all imaging volumes have then been segmented using the STAPLE \cite{warfield2004simultaneous} fusion of previous top-ranked BraTS algorithms, such as nnU-Net \cite{isensee2021nnu}. These segmented labels were then refined manually by volunteer neuroradiology experts following a consistently communicated annotation protocol. The manually refined annotations were finally approved by experienced board-certified attending neuro-radiologists.

%\subsubsection{Data Reprocessing}

%For all the MRI contrasts in the BraTS2023 dataset, we rescale the voxel intensity after Z-Score normalization as the preprocessing protocol.

\begin{figure}[t]
 % Caption and label go in the first argument and the figure contents
 % go in the second argument
\floatconts
  {fig:fig2}
  {\caption{Re-DiffiNet uses MRI and predictions from baseline U-Net as inputs to generate predictions about incorrect voxels in U-Net predictions and corrects those voxels to generate redefined tumor masks.  }}
  {\includegraphics[width=1\linewidth, clip, trim=0 0.8em 0 0]{Figure2.PNG}}
\end{figure}

% \subsubsection{Evaluation Metrics}



\section{Experiments and Results}\label{3}




% Please add the following required packages to your document preamble:
% \usepackage{multirow}
% \begin{table}[t]
% \begin{tabular}{c|cccc|cccc}
% \hline
% \multirow{2}{*}{Model}                                                 & \multicolumn{4}{l|}{Dice}                                                                                                                                                & \multicolumn{4}{l|}{HD95(mm)}                                                                                                                                \\ \cline{2-9} 
%                                                                        & \multicolumn{1}{l|}{WT}                       & \multicolumn{1}{l|}{ET}                       & \multicolumn{1}{l|}{TC}                       & Average                  & \multicolumn{1}{l|}{WT}                    & \multicolumn{1}{l|}{ET}                    & \multicolumn{1}{l|}{TC}                    & Average               \\ \hline
% Baseline U-Net                                                         & \multicolumn{1}{l|}{92.63\%}                  & \multicolumn{1}{l|}{86.87\%}                  & \multicolumn{1}{l|}{93.28\%}                  & 90.93\%                  & \multicolumn{1}{l|}{1.06}                  & \multicolumn{1}{l|}{1.62}                  & \multicolumn{1}{l|}{1.57}                  & 1.42                  \\ \hline
% U-Net (cont training)                                                  & \multicolumn{1}{l|}{91.74\%}                  & \multicolumn{1}{l|}{87.25\%}                  & \multicolumn{1}{l|}{90.84\%}                  & 89.94\%                  & \multicolumn{1}{l|}{1.80}                  & \multicolumn{1}{l|}{1.58}                  & \multicolumn{1}{l|}{1.71}                  & 1.70                  \\ \hline
% \multirow{2}{*}{U-Net   Augmented Diffusion ( U(I))}                   & \multicolumn{1}{l|}{\multirow{2}{*}{90.72\%}} & \multicolumn{1}{l|}{\multirow{2}{*}{83.76\%}} & \multicolumn{1}{l|}{\multirow{2}{*}{86.57\%}} & \multirow{2}{*}{87.02\%} & \multicolumn{1}{l|}{\multirow{2}{*}{1.12}} & \multicolumn{1}{l|}{\multirow{2}{*}{1.90}} & \multicolumn{1}{l|}{\multirow{2}{*}{2.56}} & \multirow{2}{*}{1.86} \\
%                                                                        & \multicolumn{1}{l|}{}                         & \multicolumn{1}{l|}{}                         & \multicolumn{1}{l|}{}                         &                          & \multicolumn{1}{l|}{}                      & \multicolumn{1}{l|}{}                      & \multicolumn{1}{l|}{}                      &                       \\ \hline
% \multirow{2}{*}{U-Net   Augmented Diffusion ( Concatenate U(I) and I)} & \multicolumn{1}{l|}{\multirow{2}{*}{92.86\%}} & \multicolumn{1}{l|}{\multirow{2}{*}{85.08\%}} & \multicolumn{1}{l|}{\multirow{2}{*}{91.43\%}} & \multirow{2}{*}{89.79\%} & \multicolumn{1}{l|}{\multirow{2}{*}{1.39}} & \multicolumn{1}{l|}{\multirow{2}{*}{1.93}} & \multicolumn{1}{l|}{\multirow{2}{*}{1.59}} & \multirow{2}{*}{1.63} \\
%                                                                        & \multicolumn{1}{l|}{}                         & \multicolumn{1}{l|}{}                         & \multicolumn{1}{l|}{}                         &                          & \multicolumn{1}{l|}{}                      & \multicolumn{1}{l|}{}                      & \multicolumn{1}{l|}{}                      &                       \\ \hline
% \multirow{2}{*}{U-Net   Augmented Diffusion  (applied U(I) on   I)}    & \multicolumn{1}{l|}{\multirow{2}{*}{91.32\%}} & \multicolumn{1}{l|}{\multirow{2}{*}{84.46\%}} & \multicolumn{1}{l|}{\multirow{2}{*}{91.18\%}} & \multirow{2}{*}{88.99\%} & \multicolumn{1}{l|}{\multirow{2}{*}{1.46}} & \multicolumn{1}{l|}{\multirow{2}{*}{1.74}} & \multicolumn{1}{l|}{\multirow{2}{*}{1.89}} & \multirow{2}{*}{1.70} \\
%                                                                        & \multicolumn{1}{l|}{}                         & \multicolumn{1}{l|}{}                         & \multicolumn{1}{l|}{}                         &                          & \multicolumn{1}{l|}{}                      & \multicolumn{1}{l|}{}                      & \multicolumn{1}{l|}{}                      &                       \\ \hline
% Re-DiffiNet                                                            & \multicolumn{1}{l|}{93.23\%}                  & \multicolumn{1}{l|}{86.79\%}                  & \multicolumn{1}{l|}{93.98\%}                  & 91.33\%                  & \multicolumn{1}{l|}{0.87}                  & \multicolumn{1}{l|}{1.27}                  & \multicolumn{1}{l|}{1.34}                  & 1.16                  \\ \hline
% \end{tabular}
% \end{table}

We trained 3 models 1) Baseline U-Net (section \ref{2.1}), 2) U-Net augmented diffusion or UA-Diffusion (section \ref{2.2}), and 3) \emph{Re-DiffiNet} (section \ref{2.3}). We first trained the baseline U-Net model. Then, the predictions of the baseline U-Net model were used as inputs in U-Net augmented diffusion (UA-Diffusion), and Re-DiffiNet. In a preliminary study, we trained 3 variants of the U-Net augmented diffusion (UA-Diffusion) on a random train-test split and compared them with the baseline-U-Net (See section\ref{2.1}), 2) U-Net augmented diffusion or UA-Diffusion (section \ref{2.2}), and 3) \emph{Re-DiffiNet} (section \ref{2.3}). We first trained the baseline U-Net model. Then, the predictions of the baseline U-Net model were used as inputs in U-Net augmented diffusion (UA-Diffusion), and Re-DiffiNet. In a preliminary study, we trained 3 variants of the U-Net augmented diffusion (UA-Diffusion) on a random train-test split and compared them with the baseline-U-Net (See section \ref{2.2}).

\begin{table}[t]
 % The first argument is the label.
 % The caption goes in the second argument, and the table contents
 % go in the third argument.
\floatconts
{tab:results1}%
{\caption{Comparison of the proposed model architecture in section \ref{2}.}}%
{
\begin{tabular}{l|cccc|cccc}
\hline
\multicolumn{1}{c|}{\multirow{2}[2]{*}{Model}} & \multicolumn{4}{c|}{Dice}     & \multicolumn{4}{c}{HD95(mm)} \\
\cline{2-9}
          & WT    & ET    & TC    & Avg   & WT    & ET    & TC    & Avg \\
\hline
Baseline U-Net & 92.63\% & \textbf{86.87\%} & 93.28\% & 90.93\% & 1.06 & 1.62 & 1.57 & 1.42 \\
\hline
Diff-U-Net  & 87.98\% & 83.92\% & 86.25\% & 86.05\% & 2.46 & 3.56 & 3.32 & 3.11\\

\hline
    UA-Diffusion & \multirow{2}[0]{*}{90.72\%} & \multirow{2}[0]{*}{83.76\%} & \multirow{2}[0]{*}{86.57\%} & \multirow{2}[0]{*}{87.02\%} & \multirow{2}[0]{*}{1.12} & \multirow{2}[0]{*}{1.90} & \multirow{2}[0]{*}{2.56} & \multirow{2}[0]{*}{1.86} \\
    {\footnotesize (Input: $U(I))$} &       &       &       &       &       &       &       &  \\
\hline
    UA-Diffusion & \multirow{2}[0]{*}{92.86\%} & \multirow{2}[0]{*}{85.08\%} & \multirow{2}[0]{*}{91.43\%} & \multirow{2}[0]{*}{89.79\%} & \multirow{2}[0]{*}{1.39} & \multirow{2}[0]{*}{1.93} & \multirow{2}[0]{*}{1.59} & \multirow{2}[0]{*}{1.63} \\
    {\footnotesize (Input: concat$(I, U(I))$} &       &       &       &       &       &       &       &  \\
\hline
    UA-Diffusion & \multirow{2}[0]{*}{91.32\%} & \multirow{2}[0]{*}{84.46\%} & \multirow{2}[0]{*}{91.18\%} & \multirow{2}[0]{*}{88.99\%} & \multirow{2}[0]{*}{1.46} & \multirow{2}[0]{*}{1.74} & \multirow{2}[0]{*}{1.89} & \multirow{2}[0]{*}{1.70} \\
    {\footnotesize (Input: mask$(I, U(I))$) see eq.\ref{eq:eq4}} &       &       &       &       &       &       &       &  \\

\hline
Discrepancy U-Net  & 92.15\% &85.86\% & 93.37\% & 90.46\% & 1.18 & 1.83 & 1.55 & 1.52\\

\hline
    Re-DiffiNet & \textbf{93.23\%} & 86.79\% & \textbf{93.98\%} & \textbf{91.33\%} & \textbf{0.87}  & \textbf{1.27}  & \textbf{1.34}  & \textbf{1.16} \\
\hline
\end{tabular}%
}
\end{table}

\begin{table}[t]
 % The first argument is the label.
 % The caption goes in the second argument, and the table contents
 % go in the third argument.
\floatconts
  {tab:results2}%
  {\caption{5 fold cross-validation results for 3 models: Baseline U-Net, U-Net augmented Diffusion (UA-Diffusion) with concatenation of MRI and Baseline U-Net predictions as input, and Re-DiffiNet.}}%
    {\begin{tabular}{c|c|cccc|cccc}
    \hline
    \multirow{2}{*}{Fold \#} & \multirow{2}{*}{Model} & \multicolumn{4}{|c}{Dice}                                                                            & \multicolumn{4}{|c}{HD95(mm)}                                                               \\ \cline{3-10} 
                                   &                        & {WT}      &  {ET}      &  {TC}      & Avg &  {WT}   &  {ET}   &  {TC}   & Avg \\ \hline
    \multirow{3}{*}{fold1}         & \multicolumn{1}{l|}{Baseline U-Net}                   &  {92.63\%} &  \textbf{86.87\%} &  {93.28\%} & 90.93\% &  {1.06} &  {1.62} &  {1.57} & 1.42    \\   
                                   & \multicolumn{1}{l|}{UA-Diffusion}   &  {92.72\%} &  {86.76\%} &  {93.57\%} & 91.02\% &  {1.12} &  {1.40} &  {1.56} & 1.36    \\  
                                   & \multicolumn{1}{l|}{Discrepancy U-Net}   &  {92.15\%} &  {85.86\%} &  {93.37\%} & 90.46\% &  {1.18} &  {1.83} &  {1.55} & 1.52    \\                                    
                                   & \multicolumn{1}{l|}{Re-DiffiNet}      &  \textbf{93.23\%} &  {86.79\%} &  \textbf{93.98\%} & \textbf{91.33\%} &  \textbf{0.87} &  \textbf{1.27} &  \textbf{1.34} & \textbf{1.16}    \\ \hline
    \multirow{3}{*}{fold2}         & \multicolumn{1}{l|}{Baseline U-Net}                   &  {92.60\%} &  \textbf{88.30\%} &  {93.79\%} & 91.56\% &  {1.18} &  {1.77} &  {1.24} & 1.40    \\   
                                   & \multicolumn{1}{l|}{UA-Diffusion}   &  {92.62\%} &  {87.86\%} &  {94.09\%} & 91.52\% &  {1.19} &  {1.73} &  {1.18} & 1.37    \\
                                   & \multicolumn{1}{l|}{Discrepancy U-Net}   &  {92.83\%} &  {88.15\%} &  {94.32\%} & 91.77\% &  {1.10} &  {1.79} &  {1.03} & 1.30    \\                                   
                                   & \multicolumn{1}{l|}{Re-DiffiNet}      &  \textbf{93.04\%} &  {87.34\%} &  \textbf{94.48\%} & \textbf{91.62\%} &  \textbf{0.97} &  \textbf{1.67} &  \textbf{0.85} & \textbf{1.16}    \\ \hline
    \multirow{3}{*}{fold3}         & \multicolumn{1}{l|}{Baseline U-Net}                   &  {92.40\%} &  {87.04\%} &  {92.47\%} & 90.64\% &  {1.41} &  {1.78} &  {1.46} & 1.55    \\   
                                   & \multicolumn{1}{l|}{UA-Diffusion}   &  \textbf{92.93\%} &  {87.22\%} &  \textbf{93.21\%} & \textbf{91.12\%} &  \textbf{1.05} &  {1.62} &  \textbf{1.14} & \textbf{1.27}    \\ 
                                   & \multicolumn{1}{l|}{Discrepancy U-Net}   &  {92.75\%} &  {87.13\%} &  {92.86\%} & {90.91\%} &  {1.38} &  {1.80} &  {1.35} & {1.51}    \\                                 
                                   & \multicolumn{1}{l|}{Re-DiffiNet}      &  {92.86\%} &  \textbf{87.23\%} &  {93.11\%} & 91.07\% &  {1.07} &  \textbf{1.60} &  {1.21} & 1.29    \\ \hline
    \multirow{3}{*}{fold4}         & \multicolumn{1}{l|}{Baseline U-Net}                   &  {91.21\%} &  {86.90\%} &  {92.66\%} & 90.26\% &  {1.62} &  {1.74} &  {1.30} & 1.55    \\   
                                   & \multicolumn{1}{l|}{UA-Diffusion}   &  {91.32\%} &  {86.25\%} &  {92.99\%} &90.19\% &  {1.61} &  {1.73} &  {1.26} & 1.53    \\ 
                                   & \multicolumn{1}{l|}{Discrepancy U-Net}   &  {91.67\%} &  {86.57\%} &  {92.38\%} & 90.21\% &  {1.61} &  {1.73} &  {1.26} & 1.53    \\                        
                                   & \multicolumn{1}{l|}{Re-DiffiNet}      &  \textbf{91.73\%} &  \textbf{87.18\%} &  \textbf{92.91\%} & \textbf{90.61\%} &  \textbf{1.58} &  \textbf{1.64} &  \textbf{1.21} & \textbf{1.48}    \\ \hline
    \multirow{3}{*}{fold5}         & \multicolumn{1}{l|}{Baseline U-Net}                   &  {91.30\%} &  {86.61\%} &  {93.25\%} & 90.39\% &  {1.34} &  {1.72} &  {1.16} & 1.41    \\   
                                   & \multicolumn{1}{l|}{UA-Diffusion}   &  {91.43\%} &  {87.01\%} &  {93.56\%} & 90.67\% &  {1.30} &  {1.68} &  {1.18} & 1.39    \\
                                   & \multicolumn{1}{l|}{Discrepancy U-Net}   &  {91.37\%} &  {87.36\%} &  {93.47\%} & 90.73\% &  {1.35} &  {1.47} &  {1.21} & 1.33    \\
                                   & \multicolumn{1}{l|}{Re-DiffiNet}      &  \textbf{92.72\%} &  \textbf{87.81\%} &  \textbf{94.30\%} & \textbf{91.61\%} &  \textbf{1.15} &  \textbf{1.37} &  \textbf{1.06} & \textbf{1.04}    \\ \hline

    \end{tabular}}
\end{table}

We found that using diffusion directly to predict tumor masks doesn't lead to any significant performance gains over the baseline U-Net, as shown in Table \ref{tab:results1}. On the other hand, using diffusion model to predict discrepancies and using them to correct U-Net's outputs leads to significant performance gains specially in terms of HD95 score. Among the 3 UA-Diffusion approaches concatenating the U-Net prediction and MRI yielded the best performance. Thus, we use the UA-Diffusion with concatenation of MRI and U-Net prediction, for 5-fold cross-validation in Table \ref{tab:results2}.  

The results of 5-fold cross-validation are shown in Table \ref{tab:results2}, which reports the Dice Score (DICE) and 95 percentile Hausdorff distance (HD95) and the average scores of all methods on the three overlapping regions whole tumor (WT), tumor core (TC) and Enhancing tumor (ET) for the BraTS2023 dataset (Figure \ref{fig:fig3}). We found that while using the diffusion model to directly output the tumor segmentation mask does lead to improvements over the U-Net model, the improvements are modest: 0.12\% improvement in Dice, and 5.61\% improvement in HD95 score. 
On the other hand, with Re-DiffiNet we found a 16.28\% improvement in HD95 score, indicating the benefits of modeling discrepancy using diffusion models, while simultaneously the Dice score was comparable with the baseline U-Net (0.55\% improvement). Figure \ref{fig:fig3} shows an example of the segmented masks of baseline U-Net and Re-DiffiNet. 

\begin{figure}[t]
 % Caption and label go in the first argument and the figure contents
 % go in the second argument
\floatconts
  {fig:fig3}
  {\caption{A comparison between the segmentations generated by baseline U-Net, and Re-DiffiNet. In this example, Re-DiffiNet can predict the false positive lesion on Tumor core masks that was predicted by baseline U-Net. Meanwhile, Re-DiffiNet predicts a smoother boundary.}}
  {\includegraphics[width=0.5\linewidth, clip, trim=0 1em 0 0.8em]{Figure3.PNG}}
\end{figure}


\section{Discussion and Conclusion}
In this research, we proposed a tumor segmentation framework Re-DiffiNet, which uses diffusion models to refine and improve predictions of a tumor segmentation model (like optimized U-Net). Most tumor segmentation studies optimize for region-based metrics like Dice scores, and have been able to show high Dice score in the range of 90\% or greater. However, boundary-distance metrics like HD scores are also critical, and being able to improve upon these score while not sacrificing performance on Dice score is highly desirable. In this work, we investigated the potential to refine predictions generated by state-of-the-art U-Net models using diffusion models. 

We found that while using diffusion models to directly generate tumor masks did lead to improvements in performance over the baseline U-Net, it was the use of discrepancy modeling i.e. predicting the differences between ground truth masks and baseline U-Net's outputs, that led to most significant improvements. This was indicated by 16.28\% improvement in HD-95 score, highlighting significant improvements on the boundaries of tumors. While, discrepancies can be modeled by any other modeling technique (even a U-Net), effectively acting as a boosting method, we chose diffusion primarily because of its ability to generate high-fidelity visual attributes, as well as capture variability in the data distribution, both of which are exhibited by brain tumors. Another benefit of using diffusion models instead of a U-Net to improve the baseline U-Net's predictions, would be the potential to learn more robust and diverse representations from the data, due to the inherently different mechanism using which diffusion models are trained. 


Our work shows the potential of further improving tumor segmentation by combining diffusion models and discrepancy modeling. In this work, we investigated Re-DiffiNet for the segmentation of gliomas. In the future, we intend to test our approach to improve the segmentation of other kinds of tumors like meningioma, and pediatric brain tumors.


% Please add the following required packages to your document preamble:






% \begin{algorithm2e}
% \caption{Computing Net Activation}
% \label{alg:net}
%  % older versions of algorithm2e have \dontprintsemicolon instead
%  % of the following:
%  %\DontPrintSemicolon
%  % older versions of algorithm2e have \linesnumbered instead of the
%  % following:
%  %\LinesNumbered
% \KwIn{x1,…,xn,w1,…,wnx_1, \ldots, x_n, w_1, \ldots, w_n}
% \KwOut{yy, the net activation}
% y←0y\leftarrow 0\;
% \For{i←1i\leftarrow 1 \KwTo nn}{
%   y←y+wi∗xiy \leftarrow y + w_i*x_i\;
% }
% \end{algorithm2e}

% Acknowledgments---Will not appear in anonymized version
\midlacknowledgments{Juampablo Heras Rivera is supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Department of Energy Computational Science Graduate Fellowship under Award Number DE-SC0024386.}


\bibliography{midl-samplebibliography}

\clearpage

\appendix

\counterwithin{figure}{section}
 \newpage%
 \renewcommand{\thesection}{\Alph{section}}% For Alpha numeric number
 \section{Examples of the segmentation results }
 
This section presents additional examples of the predicted tumor labels using our proposed Re-DiffiNet and the baseline U-Net.

\begin{figure}[htp]
 % Caption and label go in the first argument and the figure contents
 % go in the second argument
\floatconts
  {fig:figa1}
  {\caption{
The left example shows a case where Re-DiffiNet improves the HD95 score by an average of 0.12mm, with the Dice improvement being only 0.03\%. Conversely, the right example is improved by Re-DiffiNet for 3.09\% and 0.61mm for the Dice score and the HD95 score respectively, when compared to the baseline U-Net.}}
  {\includegraphics[width=1\linewidth, clip, trim=0 1em 0 0.8em]{Figurea1.PNG}}
\end{figure}
 

\section{Statistics Analysis}

We employ a paired-two-sample left-tailed test to assess our hypothesis regarding the difference in HD95 score between the two methods. Our initial step tested the normality assumption for the paired difference ($\mu_{\mathbf{D}}$) with Q-Q plot. Then we conduct a left-tailed test on the null hypothesis that $H_0$: $\mu_{\mathbf{D}} > 0$, with $\mu_{\mathbf{D}} = E(X-Y) = \mu_X - \mu_Y$ as the two methods ($X$: our proposed Re-Diffinet, $Y$: our baseline Optimized U-Net) being independent. The $t$-statistic we calculated is $-2.49$ with a $p$-value of $0.0067$ ($<0.01$), showing that the null hypothesis should be rejected and the alternative hypothesis $H_1: \mu_{\mathbf{D}} < 0$ could be accepted with a confidence level of $99\%$, indicating that our proposed method performs better (with a smaller distance) than the benchmarked methods.

For fold $i$, we get the following $p-$values ($p_i$) for the left-tailed paired-two-sample test: $p_2 = 0.061$ $(<0.1)$, $p_3 = 0.0095$ $(<0.01)$, $p_4 = 0.069$ $(<0.1)$, $p_5 = 0.000053$ $(<0.01)$.

For the Dice score, We perform a similar statistical analysis and don't find any significant difference between our proposed Re-Diffinet and baseline U-Net.
\end{document}
