% This is samplepaper.tex, a sample chapter demonstrating the
% LLNCS macro package for Springer Computer Science proceedings;
% Version 2.21 of 2022/01/12
%
\documentclass[runningheads]{llncs}
%
\usepackage[T1]{fontenc}
% T1 fonts will be used to generate the final print and online PDFs,
% so please use T1 fonts in your manuscript whenever possible.
% Other font encondings may result in incorrect characters.
%
\usepackage{multirow}
\usepackage{amsmath}
\usepackage{graphicx}
\usepackage[section]{placeins}
% Used for displaying a sample figure. If possible, figure files should
% be included in EPS format.
%
% If you use the hyperref package, please uncomment the following two lines
% to display URLs in blue roman font according to Springer's eBook style:
%\usepackage{color}
%\renewcommand\UrlFont{\color{blue}\rmfamily}
\usepackage[pagebackref=true,breaklinks=true,colorlinks,bookmarks=false]{hyperref}
%
\begin{document}
%
\title{A Lightweight nnU-Net Combined with Target Adaptive Loss for Organs and Tumors Segmentation}
\titlerunning{nnU-Net Combined with Target Adaptive Loss}
%
%\titlerunning{Abbreviated paper title}
% If the paper title is too long for the running head, you can set
% an abbreviated paper title here
%
\author{Tao Liu\inst{1}\orcidID{0009-0007-2933-9197} \and
Xukun Zhang\inst{1}\orcidID{0000-0003-2869-9434} \and 
Minghao Han\inst{1}\orcidID{0009-0002-0043-7539} \and
Lihua Zhang\inst{1}\orcidID{0000-0003-0467-4347}
} 
%
\authorrunning{Tao Liu et al.}
% First names are abbreviated in the running head.
% If there are more than two authors, 'et al.' is used.
%
\institute{Fudan University, Shanghai 200082, China \\
\email{\{lihuazhang\}@fudan.edu.cn}}
%
\maketitle              % typeset the header of the contribution
%
\begin{abstract}
Accurate and automated abdominal organs and tumors segmentation is of great importance in clinical practice. Due to the high time- and labor-consumption of manual annotating datasets, especially in the highly specialized medical domain, partially annotated datasets and unlabeled datasets are more common in practical applications, compared to fully labeled datasets. CNNs based methods have contributed to the development of medical images segmentation. However, previous CNN models were mostly trained on fully labeled datasets. So it is more vital to develop a method based on partially labeled datasets. In FLARE23, we design a model combining a lightweight nnU-Net and target adaptive loss (TAL) to obtain the segmentation results efficiently and make full use of partially labeled dataset. Our method achieved an average DSC score of 86.40\% and 19.41\% for the organs and lesions on the validation set and the average running time and area under GPU memory-time cure are 25.34s and 23018MB, respectively. 


\keywords{abdominal organs and tumors segmentation  \and lightweight nnU-Net \and target adaptive loss.}
\end{abstract}



\section{Introduction}
A precise pixel-level understanding of abdominal anatomy image is of vital importance for computer-aided clinical practice such as disease diagnosis, surgery navigation, radiation therapy and so on. Specifically, accurate abdominal organs and lesions segmentation plays a fundamental role in supporting clinical workflows, including diagnostic interventions and treatment planning, which can be an essential step for preoperative diagnosis.


Thanks to the significant development of deep learning, many abdominal organ segmentation methods have been designed based on deep CNNs, such as nn-UNet and 3D-UNet, which achieve great performance on different abdominal organ datasets. However, most models typically require all organs of interest to be annotated. But, it is unrealistic to get a dataset with all organs annotated because of the time- and labor-consuming labeling process. Hence, it is still an important task to segment multi-organs based on a partial labeled dataset. Currently, there exist also numerous studies dedicated to solving the problem of abdominal multi-organ and tumor segmentation. But these methods all have a common limitation, which is that the models they developed are limited to the segmentation of a certain organ and its lesions. When it comes to migrating these models to another organ segmentation task, it doesn’t work. There are still no general models for universal abdominal organ and tumor segmentation at present. As a result, it remains a challenging task to segment multi-organs and all tumors with one model.


FLARE2023 is a competition which aims to promote the development of universal organ and tumor segmentation in abdominal CT scans. The competition organizer provided a training set including 4000 3D CT scans from over 30 medical centers, of which 2200 cases are partial labeled and 1800 cases don’t have labels, and a validation set including 100 cases. In addition to precise segmentation of the 13 abdominal organs, the algorithm provided by the contestants also requires the recognition and segmentation of all the tumors on different organs in abdominal CT images, which is a challenging task. This is the first challenge which focuses on pan-cancer segmentation in CT scans. In addition, the competition also imposes limitations on inference speed, memory, and GPU memory. Each test sample needs to spend less than 28GB of memory within 60 seconds of prediction time to obtain inference result. And the peak GPU memory overhead should preferably be below 4GB, which further increases the difficulty of the competition.


We extensively investigated image segmentation methods based on partially annotated datasets, especially in medical domain. During the past several years, many studies have been devoted to solving the problem of abdominal multi organ segmentation in partially annotated datasets, but this problem remains a challenging one. A straightforward strategy is to train as many networks as partially labeled datasets, but suffers from several shortcomings including: (1) less training data for each single network, (2) longer inference time and longer training time.


Also, much more attention have been paid on training one model with several partially labeled datasets. Intuitively speaking, this strategy has many advantages, including but not limited to fully utilizing different datasets to improve robustness of model. The methods can be generally grouped into two categories. The first category is to design new network to handle this problem. Chen et al.~\cite{chen} designed a network with a task-shared encoder and as many task-specific decoders as partially labeled datasets. But this kind of network has been proven to be memory-consuming. Zhang et al.~\cite{dodnet} proposed a dynamic on-demand network (DoDNet) by catenating a one-hot vector of equal length to the number of organs with the features of images as task-specific prompt to generate weights for dynamic convolution filters. The second type of methods attempt to design adaptive loss functions that can be directly applied to partially labeled data. Fang et al.~\cite{fang} proposed a target adaptive loss (TAL) to train a network on several partially labeled dataset by treating the organs with unknown labels as background. Additionally, Shi et al.~\cite{shi} merged unlabeled organs with the background by imposing an constraint on each voxel of images and then propose a marginal and exclusive loss to train a model based on a fully labeled dataset and several partially labeled datasets. Furthermore, Liu et al.~\cite{liu} studied the partial-label segmentation on the existing approaches and identified three distinct types of supervision signals, including two signals derived from ground truth and one from pseudo label and then they proposed a training framework called COSST, which combined comprehensive supervision signals and self-training with pseudo labels, which has been demonstrated consistent great performance.


After reviewing existing methods for abdominal multi-organ segmentation based on partially labeled datasets, inspired by Fang et al., we plan to follow their design in their work, treating unlabeled organs as background and using the target adaptive loss (TAL) function proposed in ~\cite{fang}. Specificly, we merge the output channels of unlabeled organs and the original background channel into a new one. The reason for doing this is because there are always unlabeled organs in most images of the FLARE23 dataset, resulting in the inapplicability of common segmentation losses, such as dice loss. By utilizing the TAL loss, it can effectively handle this problem. What's more, due to the official requirements for segmentation efficiency and memory utilization in the competition, existing default CNNs or transformers are not competent for this task. We retrospected the top methods in FLARE22 and FLARE21, and we found that the lightweight nnU-Net designed by the top method in FLARE22 achieved remarkable efficiency without significantly reducing segmentation performance. Hence, we attempt to extend the lightweight nnU-Net proposed in FLARE22 with the target adaptive loss, to handle the segmentation of the partially labeled dataset in an efficient and effective manner.


All in all, our proposed method can be summarized as combining the lightweight nnU-Net with target adaptive loss function to achieve efficient and accurate segmentation. We will provide a detailed introduction to our proposed method in the following chapter.


\section{Method}
In this section, we will give a detailed description of our proposed method. As illustrated in Fig.~\ref{fig:Network}, our proposed method is mainly based on a lightweight nnU-Net and a target adaptive loss, which is used to handle with the partially labeled dataset.
%########################### 
\subsection{Preprocessing}
It is vital to perform data preprocessing before training. In our proposed scheme, data preprocessing can be divided into five parts, which is: 

(1) Statistical analysis: We conducted statistical analysis on the distribution of labels in the dataset and concluded that tumor labels are distributed across different organs and are unevenly distributed, making tumor segmentation tasks very difficult. 

(2) Make sure the geometry of label file match with the geometry of image file. Some cases in the dataset doesn't meet this requirement, which will influence the subsequent operation. 

(3) Cropping: Cropping out voxels with a value of zero in the image, which don't have useful information and don't affect the subsequent learning process. Instead, it can significantly reduce the image size and computational complexity. 

(4) Resampling: Resampling is a crucial step to avoid the problem of inconsistent actual spatial sizes represented by individual voxels in different images. By default setting of nnU-Net, in anisotropic datasets, for dimension with particularly large spacing, take the 10\% quantile of the spacing value of that dimension in the dataset as the target space size for that dimension.

(5) Normalization: The purpose of normalization is to ensure that the grayscale values of each image in the training set have the same distribution. The normalization operation in our method is the same as what nnU-Net does.

\subsection{Proposed Method}
Fig.~\ref{fig:Network} shows the framework of our proposed method. As illustrated in Fig.~\ref{fig:Network}, our proposed method mainly composes of two parts, a lightweight nnU-Net and a target adaptive loss (TAL), of which, the lightweight nnU-Net is adapted from the top method in FLARE22 and the TAL is used for training with partial labels.

Specifically, the lightweight nnU-Net is modified based on the default nnU-Net to improve inference speed and reduce resource consumption, and the main focus is to change channels in the first stage into 16, and change convolution number per stage into 2. Additionally, it performs downsampling only twice during inference stage, and the input patch size is reduced, the input spacing is increased to obtain a low resolution of image. We don't apply any extra strategy to improve inference speed and reduce resource consumption, except for following what the top method~\cite{FLARE22-1st-Huang} did to their small nnU-Net.


Furthermore, the target adaptive loss we use can be formulated as follow:
$$L_{TAL} = \sum_{c \in B} y_v^c \log{\hat{y_v^c}} + \boldsymbol{1}_{[\sum_{c \in B} {y_v^c=0}]} \log({1-\sum_{c \in B}{\hat{y_v^c}}})$$ where $B$ denotes the organs labeled in the input batch, $\hat{y_v^c}$ is the predicted probability of voxel $v$ labeled as class $c$ and $y_v^c$ is from ground truth, which indicates whether voxel $v$ labeled as class $c$ or not.

We treat the unlabeled organs in images as background by merging the output channels of unlabeled organs and original background channel into a new one. And then the network can be trained with supervision by TAL.

We used the pseudo labels of the 1800 unlabeled images, generated by the FLARE22 winning algorithm~\cite{FLARE22-1st-Huang}.

\begin{figure}[htbp]
\centering
\includegraphics[scale=0.7]{imgs/F3.png}
\caption{Network architecture, which includes a lightweight nnU-Net to segment images efficiently and TAL to train model based on partially labeled dataset. 
}
\label{fig:Network}
\end{figure}



\subsection{Post-processing}
We didn't use any post-processing in our method.


\section{Experiments}
\subsection{Dataset and evaluation measures}
The FLARE 2023 challenge is an extension of the FLARE 2021-2022~\cite{MedIA-FLARE21}\cite{FLARE22}, aiming to aim to promote the development of foundation models in abdominal disease analysis. The segmentation targets cover 13 organs and various abdominal lesions. The training dataset is curated from more than 30 medical centers under the license permission, including TCIA~\cite{TCIA}, LiTS~\cite{LiTS}, MSD~\cite{simpson2019MSD}, KiTS~\cite{KiTS,KiTSDataset}, autoPET~\cite{autoPET-Data,autoPET-MICCAI22}, TotalSegmentator~\cite{TotalSegmentator}, and AbdomenCT-1K~\cite{AbdomenCT-1K}. The training set includes 4000 abdomen CT scans where 2200 CT scans with partial labels and 1800 CT scans without labels. The validation and testing sets include 100 and 400 CT scans, respectively, which cover various abdominal cancer types, such as liver cancer, kidney cancer, pancreas cancer, colon cancer, gastric cancer, and so on. The organ annotation process used ITK-SNAP~\cite{ITKSNAP}, nnU-Net~\cite{nnUNet}, and MedSAM~\cite{MedSAM}.


The evaluation metrics encompass two accuracy measures—Dice Similarity Coefficient (DSC) and Normalized Surface Dice (NSD)—alongside two efficiency measures—running time and area under the GPU memory-time curve. These metrics collectively contribute to the ranking computation. Furthermore, the running time and GPU memory consumption are considered within tolerances of 15 seconds and 4 GB, respectively.


\subsection{Implementation details}
\subsubsection{Environment settings}
The development environments and requirements are presented in Table~\ref{table:env}.


\begin{table}[!htbp]
\caption{Development environments and requirements.}\label{table:env}
\centering
\begin{tabular}{ll}
\hline
System       & Ubuntu 20.04.1 LTS\\
\hline
CPU   & Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz \\
\hline
RAM                         &4$\times $32GB; 2400MT$/$s\\
\hline
GPU (number and type)                         & Two NVIDIA Quadro RTX 8000 48G\\
\hline
CUDA version                  & 12.0\\                          \hline
Programming language                 & Python 3.7\\ 
\hline
Deep learning framework & torch 1.12.0, torchvision 0.13.0 \\
\hline
Specific dependencies         & None                       \\                                                                      
\hline
Code     &                                                                \\
\hline
\end{tabular}
\end{table}


\subsubsection{Training protocols}
We used the pseudo labels of the 1800 unlabeled images, generated by the FLARE22 winning algorithm~\cite{FLARE22-1st-Huang}. As for the partial labels, We treated the unlabeled organs in images as background by merging the output channels of unlabeled organs and original background channel into a new one. Furthermore, we applied the same data augmentation, patch sampling strategy and optimal model selection criteria as the default settings of nnU-Net.




\begin{table*}[!htbp]
\caption{Training protocols.}
\label{table:training}
\begin{center}
% \resizebox{0.47\textwidth}{!}{
\begin{tabular}{ll} 
\hline
Network initialization         & \\
\hline
Batch size                    & 2 \\
\hline 
Patch size & 32$\times$128$\times$192  \\ 
\hline
Total epochs & 1500 \\
\hline
Optimizer          & SGD \\ \hline
Initial learning rate (lr)  & 0.01 \\ \hline
Lr decay schedule &  $(1 - epoch/1000)^{0.9}$ \\
\hline
Training time     & 36 hours \\  \hline 
Loss function & TAL (detailed in section 2.2)\\     \hline
Number of model parameters    & 5.64M\footnote{https://github.com/sksq96/pytorch-summary} \\ \hline
Number of flops & 8.13G\footnote{https://github.com/facebookresearch/fvcore} \\ \hline
CO$_2$eq & 5.3 Kg\footnote{https://github.com/lfwa/carbontracker/} \\  \hline
\end{tabular}
%}
\end{center}
\end{table*}


\section{Results and discussion}

\begin{table}[htbp]
\caption{Quantitative evaluation results. }
\label{tab:final-results}
\centering
\begin{tabular}{l|cc|cc|cc}
\hline
\multirow{2}{*}{Target} & \multicolumn{2}{c|}{Public Validation} & \multicolumn{2}{c|}{Online Validation} & \multicolumn{2}{c}{Testing} \\ \cline{2-7} 
                        & DSC(\%)            & NSD(\%)           & DSC(\%)            & NSD(\%)           & DSC(\%)      & NSD (\%)     \\ \hline
Liver                   & 95.59 $\pm$ $6.67$  &   90.31 $\pm$ 8.44   &   95.87  &     96.51                &    94.34        &   94.87   \\
Right Kidney            & 91.25 $\pm$ $10.14$ &  88.74 $\pm$ 10.13 &  90.41  &      91.93             &   92.28    &    93.55    \\
Spleen                  & 95.68 $\pm$ $3.71$ & 94.70 $\pm$ 6.32 &  95.62  &     96.96              &    95.48  &   97.07   \\
Pancreas                & 83.57 $\pm$ $7.76$ & 80.23 $\pm$ 11.67  &   82.13  &     94.21              &   86.50   &    95.64    \\
Aorta                   & 92.78 $\pm$ $5.08$ & 91.27 $\pm$ 8.07  &  94.19 &     96.96              &    90.90   &   94.44     \\
Inferior vena cava      & 89.33 $\pm$ $6.68$ & 83.14 $\pm$ 9.38  & 89.99 &     91.75              &   85.97  &   88.56    \\
Right adrenal gland     & 82.36 $\pm$ $3.50$ & 93.16 $\pm$ 3.90 &  80.97  &     94.04              &   75.62    &    88.03    \\
Left adrenal gland      & 79.58 $\pm$ $9.55$ & 89.94 $\pm$ 10.07  &  79.16  &     91.27              &   75.91    &    87.42     \\
Gallbladder             & 83.47 $\pm$ $13.53$ & 83.47 $\pm$ 13.53 &  78.99  &      78.26             &   76.84    &   78.29     \\
Esophagus               & 75.82 $\pm$ $17.90$ & 77.64 $\pm$ 16.59 &  79.04  &      90.48             &   83.94    &   94.11    \\
Stomach                 & 89.23 $\pm$ $9.50$ & 83.12 $\pm$ 15.55 &   88.78  &       92.23            &   83.61    &   97.10    \\
Duodenum                & 77.51 $\pm$ $10.38$ &  73.11 $\pm$ 11.33 &  77.36  &      91.75             &   78.25    &   91.50    \\
Left kidney             & 89.68 $\pm$ $14.61$ & 87.21 $\pm$ 15.70 &  90.69   &     91.88              &   92.03    &   93.39     \\
Tumor                   & 23.36 $\pm$ $25.43$ & 18.83 $\pm$ 21.51 &  19.41  &     12.25              &    24.88    &   14.91    \\ \hline
Average                   &  82.09 $\pm$ $17.42$ & 81.06 $\pm$ 11.59  &  81.62  &      86.46             &    81.18    &    85.63    \\ \hline
\end{tabular}
\end{table}




\subsection{Quantitative results on validation set}
The Dice and NSD scores of organs and tumors on the validation set is given in Table~\ref{tab:final-results}.

We have done ablation studies to analyze the effect of unlabelled data. We trained another same network as mentioned above, but we only used labeled data to train this network. We divided 2200 labeled data into two equal parts, with the first 50\% using official labels provided by the competition and the last 50\% using pseudo labels generated by the FLARE22 winning algorithm~\cite{FLARE22-1st-Huang}. Not surprisingly, the network model using unlabeled data performs better than the one that doesn't use. Network trained with both labeled and unlabeled data is exposed to more data during the training phase, result in stronger generalization ability. The validation results of the model trained without unlabeled data are given in Table~\ref{tab:ab-results}.



\begin{table}[htbp]
\caption{Quantitative evaluation results of the model trained without unlabeled data. }
\label{tab:ab-results}
\centering
\begin{tabular}{l|cc|cc}
\hline
\multirow{2}{*}{Target} & \multicolumn{2}{c|}{Public Validation} & \multicolumn{2}{c}{Online Validation}\\ \cline{2-5} 
                        & DSC(\%)            & NSD(\%)          & DSC(\%)            & NSD(\%)\\ \hline
Liver                   &   95.62 $\pm$ $2.32$   &    87.53 $\pm$ $7.79$     &   95.71  &     94.21      \\
Right Kidney           &   91.64 $\pm$ $7.42$   &     86.56 $\pm$ $10.75$    &   89.93  &     89.99      \\
Spleen                 &   90.01 $\pm$ $11.75$   &     86.46 $\pm$ $11.41$   &   89.07  &     87.38       \\
Pancreas                &   80.75 $\pm$ $6.11$   &     75.89 $\pm$ $11.14$   &   78.74  &     90.38       \\
Aorta                   &   91.19 $\pm$ $6.41$   &     86.65 $\pm$ $10.95$   &   93.10  &     95.77       \\
Inferior vena cava      &   85.28 $\pm$ $6.67$   &     74.10 $\pm$ $9.62$    &   87.86  &     88.84      \\
Right adrenal gland     &   77.26 $\pm$ $6.15$   &     87.09 $\pm$ $6.49$    &   75.28  &     88.97      \\
Left adrenal gland      &   72.97 $\pm$ $12.22$   &     89.94 $\pm$ $10.07$  &   71.41  &     84.42        \\
Gallbladder             &   77.52 $\pm$ $19.74$   &     74.22 $\pm$ $21.76$  &   73.76  &     71.39        \\
Esophagus               &   71.32 $\pm$ $17.38$   &     71.31 $\pm$ $15.57$  &   74.86  &     86.97        \\
Stomach                 &   86.01 $\pm$ $10.75$   &     76.32 $\pm$ $17.93$  &   85.69  &     87.01        \\
Duodenum                &   68.93 $\pm$ $11.93$   &     61.00 $\pm$ $13.61$  &   69.55  &     88.03        \\
Left kidney             &   84.45 $\pm$ $20.49$   &     79.58 $\pm$ $18.90$  &   85.51  &     85.07        \\
Tumor                   &   12.49 $\pm$ $18.64$   &     11.51 $\pm$ $15.44$  &   11.83  &     6.69        \\ \hline
Average                 &   77.53 $\pm$ $11.28$   &     74.87 $\pm$ $12.96$  &   77.30  &     81.79        \\ \hline
\end{tabular}
\end{table}




\begin{table}[htbp]
\caption{Quantitative evaluation of segmentation efficiency in terms of the running them and GPU memory consumption. }
\centering
\label{tab:effiency}
\begin{tabular}{ccccc}
\hline
Case ID & Image Size      & Running Time (s) & Max GPU (MB) & Total GPU (MB) \\ \hline
0001    & (512, 512, 55)  & 23.46            & 1694         & 17257           \\
0051    & (512, 512, 100) & 19.13            &  1978        & 17698          \\
0017    & (512, 512, 150) & 35.94            &  2562        & 28826          \\
0019    & (512, 512, 215) & 23.33            &  1694        & 21224          \\
0099    & (512, 512, 334) & 29.93            &  2564        & 26540          \\
0063    & (512, 512, 448) & 37.86            &  1694        & 33508          \\
0048    & (512, 512, 499) & 41.66            &  1978        & 37977          \\
0029    & (512, 512, 554) & 52.81            &  1694        & 46037          \\ \hline
\end{tabular}
\end{table}


\subsection{Qualitative results on validation set}
Fig.~\ref{fig:seg} shows four examples of segmentation results in the validation set, with two good ones and two bad ones. It can be easily seen that our method outperforms out ablation study results, which is due to the better generalization of model trained with more data. Case 0007 performed well in tumor segmentation tasks, but poorly in organ segmentation tasks. Our analysis suggests that the model may have focused more on tumors but neglected organs, and in this example, the tumor is completely located on the surface of the liver, making it difficult for the model to recognize the liver. Furthermore, we think the reason why case 0035 performed badly is that tumors spread all over left kidney, which is a hard case, causing the model to be unable to recognize left kidney and tumor. As for the two good ones, we think it may be because the location of the tumor is easier to recognize and the image is clearer.
\FloatBarrier

\begin{figure}[!htbp]
\centering
\includegraphics[scale=0.55]{imgs/examples.png}
\caption{The top two lines are good results, while the bottom two lines are bad results. Only labeled data are used in the ablation study.
}
\label{fig:seg}
\end{figure}



\subsection{Segmentation efficiency results on validation set}
The segmentation efficiency results of eight cases in the validation set under the hardware environment provided by the organizer are shown in Table~\ref{tab:effiency}. Also, we calculated the average segmentation efficiency of all the cases, with the mean running time of 25.34 seconds, the max GPU memory of 2317MB and the total GPU memory of 23018MB. This is actually a good memory and time consumption, which can be attributed to the lower computational complexity of lightweight nnU-Net.

\subsection{Results on final testing set}
The results on the final testing set are given in Table~\ref{tab:final-results}.

\subsection{Limitation and future work}
As you can see, the evaluation metrics of our method are not high, especially in tumor segmentation scenarios. The reason for this may be that we have not fully utilized unlabeled data and have not utilized tumor information in unlabeled data. In the future, we will continue to work on this foundation and try to make more full use of unlabeled data.


\section{Conclusion}
In FLARE23 contest, we designed a model combining a lightweight nnU-Net and target adaptive loss, to segment all the organs and tumors in CT volumes and get a model trained based on the partially labeled dataset. Although the results we obtain are not that satisfying, this is the foundation of our future work and we will pay more attention to mking full use of unlabeled data and partially labeled dataset.


\subsubsection{Acknowledgements} The authors of this paper declare that the segmentation method they implemented for participation in the FLARE 2023 challenge has not used any pre-trained models nor additional datasets other than those provided by the organizers. The proposed solution is fully automatic without any manual intervention. We thank all the data owners for making the CT scans publicly available and CodaLab~\cite{codalab} for hosting the challenge platform. 


%
% ---- Bibliography ----
%
% BibTeX users should specify bibliography style 'splncs04'.
% References will then be sorted and formatted in the correct style.
%
\bibliographystyle{splncs04}
\bibliography{ref}

\newpage
% Please add the following required packages to your document preamble:
% \usepackage[normalem]{ulem}
% \useunder{\uline}{\ul}{}
\begin{table}[!htbp]
\caption{Checklist Table. Please fill out this checklist table in the answer column.}
\centering
\begin{tabular}{ll}
\hline
Requirements                                                                                                                    & Answer        \\ \hline
A meaningful title                                                                                                              & Yes       \\ \hline
The number of authors ($\leq$6)                                                                                                             & 4        \\ \hline
Author affiliations and ORCID                                                                                           & Yes        \\ \hline
Corresponding author email is presented                                                                                                  & Yes        \\ \hline
Validation scores are presented in the abstract                                                                                 & Yes        \\ \hline
\begin{tabular}[c]{@{}l@{}}Introduction includes at least three parts: \\ background, related work, and motivation\end{tabular} & Yes       \\ \hline
A pipeline/network figure is provided                                                                                           & 1 \\ \hline
Pre-processing                                                                                                                  & 3   \\ \hline
Strategies to use the partial label                                                                                             & 4   \\ \hline
Strategies to use the unlabeled images.                                                                                         & 4   \\ \hline
Strategies to improve model inference                                                                                           & 4   \\ \hline
Post-processing                                                                                                                 & 4   \\ \hline
Dataset and evaluation metric section is presented                                                                              & 5   \\ \hline
Environment setting table is provided                                                                                           & 1  \\ \hline
Training protocol table is provided                                                                                             & 2  \\ \hline
Ablation study                                                                                                                  & 6   \\ \hline
Efficiency evaluation results are provided                                                                                     & 5 \\ \hline
Visualized segmentation example is provided                                                                                     & 2 \\ \hline
Limitation and future work are presented                                                                                        & Yes        \\ \hline
Reference format is consistent.  & Yes       \\ \hline

\end{tabular}
\end{table}

\end{document}
