\section{Introduction}
Cone-Beam Computed Tomography (CBCT) provides comprehensive 3D information of the oral region and is an important imaging tool in dentistry, as shown by its rapid adoption in dental clinics \cite{clinical_cbct_review}. Precision segmentation of the tooth and pulp structures is vital to various applications such as dental conditions diagnosis, orthodontic procedures, treatment and surgery planning \cite{cbct_use_tyndall2012,cbct_endo}. However, manual segmentation of CBCT scans requires specialized training and is extremely time-consuming due to its high resolution containing a massive number of voxels and the high variability across scans, making it impractical to scale up in practice. This highlights the significance of developing effective semi-supervised approaches with only limited labeled data while leveraging a large amount of unlabeled CBCT scans  \cite{sdtooth,sts2023,ctooth_plus}.

Semi-supervised learning (SSL) incorporates elements from both supervised and unsupervised learning \cite{semisupervised,semisupervised2}, utilizing both labeled and unlabeled data to improve the performance on the supervised task by exploring the latent knowledge from unlabeled data. This alleviates the need for a significant amount of labels, which can require considerable resources to obtain. We focus on three categories of SSL: 
1) Knowledge transfer with pre-training refers to the transfer of knowledge from one task to another via pre-training, where autoencoders \cite{denoising_ae,mae,dae} are trained to reconstruct corrupted input from a large amount of unlabeled data to guide the randomly initialized model weights towards potentially better regions;
2) Consistency regularization training \cite{cct,cr_ssl,cr_vae} based on the smoothness assumption, enforces the model to produce similar output after perturbing the input, internal features, or model weights, pushing the model towards better generalization capability; and
3) Pseudo labeling method \cite{pseudolabel}, one of the most common approaches in SSL due to its simplicity and model-agnostic nature. It is a form of entropy regularization \cite{entropy_reg} with unlabeled data, reducing the overlap of class probability distribution and favoring a low-density class separation.


In this paper, we present U-Mamba2-SSL, a multi-stage semi-supervised learning framework for tooth and pulp segmentation in 3D CBCT images, developed in the scope of the STSR 2025 Task 1 Challenge \cite{stsr2025}.
To exploit the vast amount of unlabeled CBCT data, we first pre-train U-Mamba2 \cite{umamba2} with the disruptive autoencoder on all provided data. Then, the second training stage involves using the labeled data for supervised learning and the unlabeled data for unsupervised learning via consistency regularization techniques in the input and feature spaces. Lastly, the final stage introduces the pseudo labeling method to the training procedure of the previous stage, with a lower loss weight to further optimize the model weights.
The extensive experiments demonstrate the superior performance of our method, outperforming other alternatives and achieving first place with an average score of 0.789 in the STSR 2025 hidden test set.
