Keywords: Swin-UNet, Dual Swin-UNet, Transformer-based Segmentation, Dental Image Segmentation, Medical Image Analysis, Multiclass Segmentation
Abstract: Transformer-based architectures have reshaped the landscape of medical image segmentation, yet their application to high-resolution dental imaging remains underexplored. In this work, we study a Dual Swin-UNet framework that leverages two complementary stages of Swin Transformer processing to more effectively segment dental structures. The method begins by dividing full-resolution dental radiographs into spatially consistent slices using a lightweight Swin-driven partitioning module. Each slice is then independently analyzed by a Swin-UNet decoder–encoder network, allowing the model to focus on localized anatomical details without sacrificing global context. Before segmentation, we apply Contrast Limited Adaptive Histogram Equalization (CLAHE) to enhance subtle boundaries, which yields nearly a 10\% \ performance gain. After inference, all slice-level predictions are merged to reconstruct a complete segmentation map at the original resolution. Experimental results indicate that this two-stage design substantially improves segmentation quality compared to a standard Swin-UNet, particularly in challenging low-contrast regions. Our approach achieves state-of-the-art performance with a mean Dice score of up to 88\% for tooth segmentation. These findings highlight the value of Transformer-based slicing strategies for detailed dental image analysis and demonstrate their potential to support clinical diagnostics and treatment planning.
Primary Subject Area: Segmentation
Secondary Subject Area: Learning with Noisy Labels and Limited Data
Registration Requirement: Yes
Visa & Travel: Yes
Read CFP & Author Instructions: Yes
Originality Policy: Yes
Single-blind & Not Under Review Elsewhere: Yes
LLM Policy: Yes
Submission Number: 365
Loading