Multi-Stage Aggregation Transformer for Medical Image Segmentation

Published: 01 Jan 2023, Last Modified: 14 Nov 2024ICASSP 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Capturing rich multi-scale features is essential for resolving complex variations in medical image segmentation. In this paper, we explore how to fully utilize the advantages of Convolutional neural networks (CNN) and Transformer, and propose a novel multi-stage aggregation architecture named MA-Transformer for accurate segmentation of medical images with large variations and blurs. Specifically, an encoder module is introduced in each stage, which is a dual-branch structure parallelly combining Transformers and convolutions. By such design, the self-attention can provide a global context for CNN to extract multi-resolution complementary features stage by stage, thus the feature representations are gradually enhanced with local details and contextual information. Multi-scale semantic features are then combined with skip connections in the decoder to produce the final result. Extensive experiments on public medical imaging datasets demonstrate our superior segmentation performance, compared to the state-of-the-art CNN-based, Transformer-based approaches and CNN-Transformer combined approaches. Code will be made publicly available.
Loading