SVMB-Net: Local Global Fusion and Multi-Branch Cross-Feature Attention for Skin Lesion Segmentation

Yuan Zhao, Jinlai Zhang, Wujiao He, Sheng Wu

Published: 01 Jan 2025, Last Modified: 01 Apr 2026IEEE Journal of Biomedical and Health InformaticsEveryoneRevisionsCC BY-SA 4.0
Abstract: Accurate segmentation of skin lesions remains a key challenge for early cancer diagnosis due to complex morphological variations such as irregular shape, heterogeneous texture and low contrast. To address these limitations, we propose SVMB-Net, a dual-architecture framework that integrates SwinTransformer and CNN with the following innovations: first, the Super ViT-CNN (SViT-C) hybrid encoder employs a special global restoration module for extracting high-level semantics, whereas the dual-branch fusion module combines CNN's local feature extraction with SwinTransformer's global context modelling synergy. Second, our multi-branch deep cross-feature attention decoder introduces a multi-scale attention mechanism. Comprehensive evaluations on three clinical datasets show significant improvements: on ISIC2018, SVMB-Net improves $DSC$ by 7.67% to 93.88% and $ACC$ by 2.21% to 96.97% against the current state-of-the-art segmentation method DINOv2. Experiments conducted at ISIC2017 and $PH^{2}$ show an $IoU$ of 83.45% and an $ACC$ of 97.08%, which largely outperforms 16 existing methods such as SAM2-UNet and VM-UNet. The architecture provides a powerful solution for automated lesion analysis in real-world clinical settings. Early detection and surgical treatment are crucial for the successful cure of skin cancer. However, the accuracy of detecting skin cancer is challenged due to its variations in shape, size, color, texture, hair, contrast difference, brightness, and irregular boundaries. To address these issues, a new skin cancer image segmentation method, SVMB-Net, is proposed in this paper. Our code will be open sourced at https://github.com/Sleepearlyy/SVMB-Net.git.
Loading