Abstract: Convolutional Neural Networks (CNNs) and
Transformer-based self-attention models have become
the standard for medical image segmentation. This
paper demonstrates that convolution and self-attention,
while widely used, are not the only effective methods for
segmentation. Breaking with convention, we present a
Convolution and self-Attention-free Mamba-based semantic Segmentation Network named CAMS-Net. Specifically,
we design Mamba-based Channel Aggregator and Spatial
Aggregator, which are applied independently in each
encoder-decoder stage. The Channel Aggregator extracts
information across different channels, and the Spatial Aggregator learns features across different spatial locations.
We also propose a Linearly Interconnected Factorized Mamba (LIFM) block to reduce the computational complexity of a Mamba block and to enhance its decision
function by introducing a non-linearity between two factorized Mamba blocks. Our model outperforms the existing
state-of-the-art CNN, self-attention, and Mamba-based
methods on CMR and M&Ms-2 Cardiac segmentation
datasets, showing how this innovative, convolution, and
self-attention-free method can inspire further research
beyond CNN and Transformer paradigms, achieving
linear complexity and reducing the number of parameters.
Source code and pre-trained models are available at:
https://github.com/kabbas570/CAMS-Net.
Loading