Hybrid Module with Multiple Receptive Fields and Self-Attention Layers for Medical Image Segmentation

Wenbo Qi, Wenyong Zhou, Ngai Wong, S. C. Chan

Published: 2024, Last Modified: 25 Jan 2026ICASSP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Recent advances in medical image segmentation models combine convolution with the attention mechanism which provides an effective approach to formulate long-term dependencies. However, many works either replaced the convolutional layers with attention layers or embedded attention layers into convolutional neural network (CNN)-based models. To explore the potential of hybrid architecture, we propose a simple cascade module that builds up multiple receptive fields using convolutional kernels with different sizes and learns global context via self-attention layers. Benefiting from the powerful representation ability of the proposed module, multilayer perceptrons (MLPs) with shift operation are adopted to bridge the encoder and decoder to reduce the model size without losing accuracy. Experiments show that our model consistently outperforms the latest 2D and 3D models by large margins on three public tasks and is more resilient to shape, size, and boundary variations. The code is available at https://github.com/cicailalala/AERFNet.

External IDs:dblp:conf/icassp/QiZW024