Multi-Scale Attention Fusion with Lesion-Area Focus for Knowledge-Enhanced Dermoscopic Skin Lesion Classification

Danjun Wang, Qingyang Liu, Yanrong Hu, Hongjiu Liu

Published: 01 Dec 2025, Last Modified: 13 Mar 2026Applied SciencesEveryoneRevisionsCC BY-SA 4.0
Abstract: Skin diseases are common conditions that pose a significant threat to human health, and automated classification plays an important role in assisting clinical diagnosis. However, existing image classification approaches based on convolutional neural networks (CNNs) and Transformers have inherent limitations. CNNs are constrained in capturing global features, whereas Transformers are less effective in modeling local details. Given the characteristics of dermoscopic images, both local and global features are equally crucial for classification tasks. To address this issue, we propose an improved Swin Transformer-based model, termed MaLafFormer, which incorporates a Modulated Fusion of Multi-scale Attention (MFMA) module and a Lesion-Area Focus (LAF) module to enhance global modeling, emphasize critical local regions, and improve lesion boundary perception. Experimental results on the ISIC2018 dataset show that MaLafFormer achieves 84.35% ± 0.56% accuracy (mean of three runs), outperforming the baseline 77.98% ± 0.34% by 6.37%, and surpasses other compared methods across multiple metrics, thereby validating its effectiveness for skin lesion classification tasks.
Loading