Gated Multi-Scale Attention Transformer For Few-Shot Medical Image Segmentation

Zhenghao Zhao, Hao Ding, Dawen Cai, Yan Yan

Published: 01 Jan 2024, Last Modified: 14 Nov 2024ISBI 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Few-shot learning has gained a lot of attention in the medical image semantic segmentation field due to the limited availability of annotated medical images. Existing studies use Prototypical Networks (PN) to perform segmentation with a limited number of labeled medical images and have obtained success. However, these approaches overlook pairwise relations between query and support images and the spatial structure in feature maps. Multi-level feature correlations overcome these problems by capturing high-level semantic and low-level geometric cues through aggregation with 4D convolutions. In this paper, we propose a novel network architecture named Gated Multi-scale Attention Transformer (GMAT), which utilizes a 4D convolutional Swin Transformer network with pyramidal aggregation and a gated multi-scale attention mechanism. Extensive experiments on the abdomen CT dataset demonstrate the superiority of our model over existing state-of-the-art methods.