Coupled Swin Transformers and Multi-Apertures Network (CSTA-NET) Improves Medical Image Segmentation
Abstract: Vision Transformers have outperformed traditional convolution-based frameworks across various visual tasks, including, but not limited to, the segmentation of 3D medical images. To further advance this area, this study introduces the Coupled Swin Transformers and Multi-Apertures Networks (CSTA-Net), which integrates the outputs of each Swin Transformer with an Aperture Network. Each aperture network consists of a convolution and a fusion block for combining global and local feature maps. The proposed model has been tested on two independent datasets to show that fine details are delineated. The proposed architecture was trained on the Synapse multi-organ and ACDC datasets to conclude an average Dice score of 90.19±0.05 and 93.77±0.04, respectively. The code is available here: https://github.com/Siyavashshabani/CSTANet.
Loading