Keywords: Vision Transformers, Deep Learning, Skin Cancer, Dermatology images
Abstract: Recent advances in computer vision have made Vision Transformers (ViTs) strong alternatives to CNNs in medical imaging. We compare top ViT models—including Token-to-Token ViT, CaiT, LeViT, ATSViT, and XCiT—on the Kaggle skin cancer dataset, focusing on classification accuracy, real score, and model complexity. While ViTs for small datasets show high accuracy, they have many parameters; LeViT offers strong performance with the fewest parameters. This review highlights current trends, deployment challenges, and future directions for transformers in skin cancer detection.
Submission Number: 85
Loading