ResSwinUnet3D: Developing A New Residual-Based SwinUnet3D Model for Enhanced 3D Medical Image Segmentation

Published: 19 Aug 2025, Last Modified: 12 Oct 2025BHI 2025EveryoneRevisionsBibTeXCC BY 4.0
Confirmation: I have read and agree with the IEEE BHI 2025 conference submission's policy on behalf of myself and my co-authors.
Keywords: 3D medical image segmentation, class activation maps, decoder, encoder, residual blocks, vanishing gradients, vision transformers
TL;DR: We propose ResSwinUnet3D, a residual-enhanced Swin Transformer architecture that achieves state-of-the-art performance in 3D medical image segmentation across multiple datasets.
Abstract: Accurate segmentation of 3D medical images remains a significant challenge due to complex anatomical variations, low contrast between adjacent structures, and the computational burden associated with volumetric data. Conventional deep learning models often encounter vanishing gradients and limited feature propagation in deep architectures, particularly when handling large-scale 3D volumes. To address these issues, this paper presents ResSwinUnet3D, a residual SwinUnet3D architecture for 3D medical image segmentation that combines vision transformers, convolutional neural networks, and residual connections. The proposed model extends the SwinUnet3D design by introducing residual blocks between the encoder and decoder components to mitigate the vanishing gradient problem and improve information flow through deep layers.Experiments were conducted on three datasets: BraTS 2020, BraTS 2021, and Synapse Multi-Organ CT Segmentation. On the BraTS 2020 dataset, our model achieved Dice Similarity Coefficients of 0.9170, 0.8539, and 0.8030 for whole Tumor, Tumor Core, and Enhancing Tumor regions, respectively. For the BraTS 2021 dataset, our model achieved Dice scores of 0.9211, 0.9200, and 0.8924 for Whole Tumor, Tumor Core, and Enhanced Tumor, respectively. On the Synapse Multi-Organ CT Segmentation dataset, ResSwinUnet3D attained a mean Dice score of 0.8276 across 13 organ classes. With the integration of residual blocks, our model achieves a 5–20\% overall improvement in performance compared to SwinUNet3D and other similar models such as Attention UNet and UNETR across the previously specified datasets and evaluation metrics. Gradient-weighted Class Activation Mapping analyses further showed that residual connections produce interpretable activation maps, clarifying the model’s decision process. These findings suggest that ResSwinUnet3D offers a robust and efficient solution for volumetric segmentation across diverse organs and imaging modalities.
Track: 3. Imaging Informatics
Registration Id: 36NYQ5QVKR9
Submission Number: 347
Loading