CaVMamba: convolution-augmented VMamba for medical image segmentation

Qiaohong Chen, Zhenyang Xu, Xian Fang

Published: 2025, Last Modified: 31 Aug 2025Vis. Comput. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Medical image segmentation is a crucial step in computer-aided clinical diagnosis. Existing methods, such as Convolutional Neural Networks (CNNs) and Vision Transformers, often struggle with capturing comprehensive information or are computationally intensive. This study proposes CaVMamba, a novel model that enhances VMamba’s capabilities by leveraging the powerful local feature extraction ability of convolution. CaVMamba adopts a unique sandwich structure that combines convolutional and VMamba features through Feed Forward Modules, resulting in a more comprehensive representation. Our model further incorporates a Dynamic Feature Fusion Module to selectively fuse multi-scale features, emphasizing relevant information while suppressing irrelevant details. Experiments on three medical image segmentation datasets demonstrate CaVMamba’s superiority, achieving state-of-the-art performance with fewer parameters and computational cost. The open-source code and datasets are available to facilitate reproducibility and future research.

External IDs:dblp:journals/vc/ChenXF25