Keywords: U-Mamba2, CBCT Imaging, Dental Anatomy Segmentation, Deep Learning, ToothFairy3 Challenge
TL;DR: We present U-Mamba2, a new neural network architecture designed for multi-anatomy CBCT segmentation in the context of the ToothFairy3 challenge, integrating Mamba2 SSM in the U-Net architecture.
Abstract: Cone-Beam Computed Tomography (CBCT) is a widely used 3D imaging technique in dentistry, providing volumetric information about the anatomical structures of jaws and teeth. Accurate segmentation of these anatomies is critical for clinical applications such as diagnosis and surgical planning, but remains time-consuming and challenging.
In this paper, we present U-Mamba2, a neural network architecture designed for multi-anatomy CBCT segmentation in the context of the ToothFairy3 challenge. U-Mamba2 integrates the Mamba2 state space models into the U-Net architecture, enforcing stronger structural constraints for higher efficiency without compromising performance.
In addition, we integrate interactive click prompts with cross-attention blocks, pre-train U-Mamba2 using self-supervised learning, and incorporate dental domain knowledge into the model design to address key challenges of dental anatomy segmentation in CBCT.
Extensive experiments, including independent tests, demonstrate that U-Mamba2 is both effective and efficient, securing first place in both tasks of the Toothfairy3 challenge.
In Task 1, U-Mamba2 achieved a mean Dice of 0.84, HD95 of 38.17 with the held-out test data, with an average inference time of 40.58s. In Task 2, U-Mamba2 achieved the mean Dice of 0.87 and HD95 of 2.15 with the held-out test data.
The code is publicly available at https://github.com/zhiqin1998/UMamba2.
Changes Summary: Response to Reviewer jVci
> Inference time discrepancy
Thank you for pointing out the discrepancy. The discrepancy is due to the difference in GPU. 40.58s is calculated with a Nvidia T4 on GrandChallenge while 6.81s is calculated with an RTX4090. We have added this explanation in Section 3.4
> Point Encoder for SwinUNETR
We were unable to directly incorporate the point encoder for SwinUNETR due to GPU memory limit and limited time to optimize the model architecture during the competition.
> U-Mamba2 spelling
Thank you for pointing out the errors, we have corrected all the spelling mistakes.
Response to Reviewer qpmD
> Label smoothing
Thank you for the suggestion. We have added a mathematical formulation for the label smoothing process.
> Post-processing threshold
The thresholds are pre-computed once using the entire TF3 training set.
> Inference tiles
We extended Section 3.3 to describe the tile size parameter in sliding window inference and clarified in Section 3 the voxel spacing used in the experiments.
> Missing citation
Thank you for pointing out, we have fixed the citations for the dataset papers.
Response to Reviewer Ac5V
> Novelty claim
Thank you for pointing out, we have rephrased our contributions claim and modified Section 2.1 to clarify the network contribution and appropriately acknowledging U-Mamba.
> Mamba2 justification:
We clarify that the comparative experiment between Mamba2 and the original Mamba layer is presented in Table 1, where Row 3 (U-Mamba) has similar architecture to our U-Mamba2 but utilizes the original Mamba block. We have added a sentence to clarify this in Section 3.1.
> Memory consumption details:
The maximum CBCT scan resolution in ToothFairy3 is only 298x512x512, which requires less than 8GB to store all logits in 16-bit precision. If the scan resolution is larger than the available RAM, we can utilize tools such as numpy.memmap to offload the large arrays onto disk.
> Postprocessing limitations:
Thank you for the suggestion. We agree that the current post-processing method is simplistic and only works well when false positives are confined in background voxels. However, due to time constraints, we leave the investigation of a more intelligent post-processing method to future research.
> Postprocessing improvement:
As shown in Table 1, we observe that all models improve their Dice score by 0.02 - 0.03 when post-processing is applied in Task 1. The post-processing method does not significantly improve U-Mamba2’s performance when compared to other models, as it is a model-agnostic method.
Latex Source Code: zip
Main Tex File: main.tex
Confirm Latex Only: true
Code Url: https://github.com/zhiqin1998/UMamba2
Authors Changed: false
Arxiv Update Plans: Yes, https://arxiv.org/abs/2509.12069
Copyright: pdf
Submission Number: 17
Loading