% Dataset and evaluation measures (The authors can either use the default description in the
% template of write this part by themselves. The dataset-related papers should be cited.)
% Implementation details
% Detailed description of training protocols


\subsection{Dataset and evaluation measures}
The FLARE2022 dataset is curated from more than 20 medical groups under the license permission, including MSD \cite{simpson2019MSD}, KiTS \cite{KiTS,KiTSDataset}, AbdomenCT-1K \cite{AbdomenCT-1K}, and TCIA \cite{clark2013TCIA}. The training set includes 50 labelled CT scans with pancreas disease and 2000 unlabelled CT scans with liver, kidney, spleen, or pancreas diseases. The validation set includes 50 CT scans with liver, kidney, spleen, or pancreas diseases.
The testing set includes 200 CT scans where 100 cases has liver, kidney, spleen, or pancreas diseases and the other 100 cases has uterine corpus endometrial, urothelial bladder, stomach, sarcomas, or ovarian diseases. All the CT scans only have image information and the center information is not available.

The evaluation measures consist of two accuracy measures: Dice Similarity Coefficient (DSC) and Normalized Surface Dice (NSD), and three running efficiency measures: running time, area under GPU memory-time curve, and area under CPU utilization-time curve. All measures will be used to compute the ranking. Moreover, the GPU memory consumption has a 2 GB tolerance.

\subsection{Implementation details}
\subsubsection{Environment settings}
The development environments and requirements are presented in Table~\ref{table:env}.


\begin{table}[!h]
\caption{Development environments and requirements.}\label{table:env}
\centering
\begin{tabular}{| l | l |}
\hline
Windows/Ubuntu version       & Ubuntu 18.04.5 LTS\\
\hline
CPU   & Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz \\
\hline
RAM                         &1$\times $32GB; \\
\hline
GPU (number and type)                         & One Quadro RTX 5000 16G\\
\hline
CUDA version                  & 11.6\\                          \hline
Programming language                 & Python 3.10\\ 
\hline
Deep learning framework & Pytorch (Torch 1.11.0, torchvision 0.12.0) \\
% \hline
% Specific dependencies         &                        \\                                                                      
% \hline
% (Optional) Link to code     &                                                                \\
\hline
\end{tabular}
\end{table}

\vspace{-0.5cm}
\subsubsection{Training protocols}

Currently, we find that using only simple 2D transform functions such as horizontal/vertical flipping or rotating might be enough for both modules to generalize. In the training stage, the Reference module follow traditional training process, in which two models are concurrently trained. For the Propagation module, we inherit the same process as in \cite{stcn21cheng} which samples 3 neighboring slices at a time.

Table \ref{table:training1} and Table \ref{table:training2} mention the training protocols for Reference module and Propagation module, respectively. In both settings, we use the original-sized images, which is $[512, 512]$ for the training and inference phases. 

\begin{table*}[!h]
\caption{Training protocols for Reference module: CPS of TransUnet and Efficientnet DeeplabV3+ }
\label{table:training1}
\begin{center}
% \resizebox{0.47\textwidth}{!}{
\begin{tabular}{| l | l |} 
\hline
Network initialization         & Random initialization\\
\hline
Batch size                    & 2 (labeled) $+$ 2 (unlabeled) \\
\hline 
Patch size & $512\times512$  \\ 
\hline
Total iterations & 50000 \\
\hline
Optimizer          & AdamW          \\ \hline
Initial learning rate (lr)  & 0.0001 \\ \hline
Lr decay schedule & multiplied by 0.5 for every iteration at $[40000, 45000]$ \\
\hline
Training time                                           & 48 hours \\  \hline 
Loss functions & Dice Loss + Cross-Entropy Loss \\ \hline
Number of model parameters    & \makecell{105M (TransUnet Resnet50) \\ + 11M  (Efficientnet DeeplabV3+)} \footnote{Pytorch} \\ \hline
Number of flops &  \makecell{108G (TransUnet Resnet50) \\+ 1,3G (Efficientnet DeeplabV3+)}  \footnote{Pytorch} \\ \hline

% CO$_2$eq & \textcolor{red}{ \textbf{Missing}} Kg\footnote{https://github.com/lfwa/carbontracker/} \\  \hline
\end{tabular}
%}
\end{center}
\end{table*}


\begin{table*}[!h]
\caption{Training protocols for Propagation module: STCN with Resnet backbone }
\label{table:training2}
\begin{center}
% \resizebox{0.47\textwidth}{!}{
\begin{tabular}{| l | l |} 
\hline
Network initialization         & Random initialization\\
\hline
Batch size                    & 8 \\
\hline 
Patch size & $512\times512$  \\ 
\hline
Total iterations & 50000 \\
\hline
Optimizer          & AdamW          \\ \hline
Initial learning rate (lr)  & 0.0001 \\ \hline
Lr decay schedule & multiplied by 0.5 for every iteration at $[40000, 45000]$ \\
\hline
Training time                                           & 48 hours \\  \hline 
Loss functions & OhemCE Loss +  Lovasz Loss \\ \hline
Number of model parameters    & 54,416,065 \footnote{Pytorch} \\ \hline

% Number of flops & \textcolor{red}{ \textbf{Missing}} \footnote{Pytorch} \\ \hline
% CO$_2$eq & \textcolor{red}{ \textbf{Missing}} Kg\footnote{https://github.com/lfwa/carbontracker/} \\  \hline
\end{tabular}
%}
\end{center}
\end{table*}