U-Mamba2: Scaling State Space Models for Dental Anatomy Segmentation in CBCT

15 Sept 2025 (modified: 17 Nov 2025)MICCAI 2025 Workshop ODIN SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: U-Mamba2, CBCT Imaging, Dental Anatomy Segmentation, Deep Learning, ToothFairy3 Challenge
TL;DR: We present U-Mamba2, a new neural network architecture designed for multi-anatomy CBCT segmentation in the context of the ToothFairy3 challenge, integrating Mamba2 SSM in the U-Net architecture.
Abstract: Cone-Beam Computed Tomography (CBCT) is a widely used 3D imaging technique in dentistry, providing volumetric information about the anatomical structures of jaws and teeth. Accurate segmentation of these anatomies is critical for clinical applications such as diagnosis and surgical planning, but remains time-consuming and challenging. In this paper, we present U-Mamba2, a neural network architecture designed for multi-anatomy CBCT segmentation in the context of the ToothFairy3 challenge. U-Mamba2 integrates the Mamba2 state space models into the U-Net architecture, enforcing stronger structural constraints for higher efficiency without compromising performance. In addition, we integrate interactive click prompts with cross-attention blocks, pre-train U-Mamba2 using self-supervised learning, and incorporate dental domain knowledge into the model design to address key challenges of dental anatomy segmentation in CBCT. Extensive experiments, including independent tests, demonstrate that U-Mamba2 is both effective and efficient, securing first place in both tasks of the Toothfairy3 challenge. In Task 1, U-Mamba2 achieved a mean Dice of 0.84, HD95 of 38.17 with the held-out test data, with an average inference time of 40.58s. In Task 2, U-Mamba2 achieved the mean Dice of 0.87 and HD95 of 2.15 with the held-out test data. The code is publicly available at https://github.com/zhiqin1998/UMamba2.
Changes Summary: Response to Reviewer Ac5V > Novelty claim Thank you for pointing out, we have rephrased our contributions claim and modified Section 2.1 to clarify the network contribution and appropriately acknowledge U-Mamba. > Mamba2 justification: We clarify that the comparative experiment between Mamba2 and the original Mamba layer is presented in Table 1, where Row 3 (U-Mamba) has a similar architecture to our U-Mamba2 but utilizes the original Mamba block. We have added a sentence to clarify this in Section 3.1. > Memory consumption details: The maximum CBCT scan resolution in ToothFairy3 is only 298x512x512, which requires less than 8GB to store all logits in 16-bit precision. If the scan resolution is larger than the available RAM, we can utilize tools such as numpy.memmap to offload the large arrays onto disk. > Postprocessing limitations: Thank you for the suggestion. We agree that the current post-processing method is simplistic and only works well when false positives are confined to background voxels. However, due to time constraints, we leave the investigation of a more intelligent post-processing method to future research. > Postprocessing improvement: As shown in Table 1, we observe that all models improve their Dice score by 0.02 - 0.03 when post-processing is applied in Task 1. The post-processing method does not significantly improve U-Mamba2’s performance when compared to other models, as it is a model-agnostic method.
Latex Source Code: zip
Main Tex File: main.tex
Confirm Latex Only: true
Code Url: https://github.com/zhiqin1998/UMamba2
Authors Changed: false
Arxiv Update Plans: Yes, https://arxiv.org/abs/2509.12069
Submission Number: 17
Loading