Abstract: Few-shot learning has gained a lot of attention in the medical image semantic segmentation field due to the limited availability of annotated medical images. Existing studies use Prototypical Networks (PN) to perform segmentation with a limited number of labeled medical images and have obtained success. However, these approaches overlook pairwise relations between query and support images and the spatial structure in feature maps. Multi-level feature correlations overcome these problems by capturing high-level semantic and low-level geometric cues through aggregation with 4D convolutions. In this paper, we propose a novel network architecture named Gated Multi-scale Attention Transformer (GMAT), which utilizes a 4D convolutional Swin Transformer network with pyramidal aggregation and a gated multi-scale attention mechanism. Extensive experiments on the abdomen CT dataset demonstrate the superiority of our model over existing state-of-the-art methods.
Loading