Coupled Swin Transformers and Multi-Apertures Network (CSTA-NET) Improves Medical Image Segmentation

Siyavash Shabani, Sahar A. Mohammed, Muhammad Sohaib, Bahram Parvin

Published: 01 Apr 2025, Last Modified: 16 Oct 20252025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI)EveryoneCC BY 4.0

Abstract: Vision Transformers have outperformed traditional convolution-based frameworks across various visual tasks, including, but not limited to, the segmentation of 3D medical images. To further advance this area, this study introduces the Coupled Swin Transformers and Multi-Apertures Networks (CSTA-Net), which integrates the outputs of each Swin Transformer with an Aperture Network. Each aperture network consists of a convolution and a fusion block for combining global and local feature maps. The proposed model has been tested on two independent datasets to show that fine details are delineated. The proposed architecture was trained on the Synapse multi-organ and ACDC datasets to conclude an average Dice score of 90.19±0.05 and 93.77±0.04, respectively. The code is available here: https://github.com/Siyavashshabani/CSTANet.