Abstract: Skin cancer is a leading malignant disease with rising incidence rates, emphasizing the need for early and accurate diagnosis. This paper introduces a new fusion method for classifying multiple classes of skin lesions by combining the Vision Transformer (ViT) and Vision Permutator (ViP) models. The proposed method leverages the global attention mechanism of ViT and the spatial encoding capabilities of ViP to enhance classification performance. Additionally, various data augmentation techniques, such as random zoom, flip, shift, and range adjustments are applied to tackle the issue of class imbalance. The proposed method is evaluated and analyzed with the ISIC2019 dataset. The models have trained on ISIC2019 dataset without being pretrained on a large dataset, e.g. ImageNet. The experimental results demonstrated that the fusion models, particularly Fusion cat and Fusion max, achieved superior performance compared to individual ViT and ViP models. Specifically, Fusion max attained accuracy of 80.86%, while Fusion cat reaches in 77.96% mean recall, 76.81% mean precision, and 77.38% F1-score. These findings suggest that our proposed models can significantly enhance automated skin lesion classification, contributing to the early diagnosis of skin cancer.
Loading