A Semantic Segmentation Method for Skin Lesion Images Based on ViT

Published: 2024, Last Modified: 30 Dec 2025ICONIP (8) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Regular monitoring is paramount in the prevention of melanoma, a type of skin cancer. A highly effective method for detection involves the use of image segmentation models to delineate areas of skin pathology. This approach aids medical professionals in accurately identifying regions of abnormality, thereby facilitating precise diagnosis and treatment. Nevertheless, the scarcity of expertly annotated datasets presents a significant challenge for directly training effective image segmentation models. To address this issue, the present paper introduces the EffiSeg model, a semantic segmentation method for skin lesion imagery based on Vision Transformers (ViT). The EffiSeg model leverages the strengths of models pre-trained on extensive datasets, incorporating lightweight Adapter Modules (AM) within the Transformer architecture. This integration enables the transfer of features from Convolutional Neural Networks (CNNs) to ViT through knowledge distillation, thereby enhancing the model’s capability in information extraction. Moreover, fine-tuning is applied to improve the model’s adaptability and segmentation efficacy. The results of our experiments demonstrate that the EffiSeg model outperforms alternative approaches in segmentation performance on datasets with limited samples. It achieves superior performance by minimizing training parameters and enhancing generalization, thus presenting a robust and promising model for practical application in the field.
Loading