A Multimodal Deep Ensemble Framework for Skin Lesion Classification

Nhan Le Thanh Pham, Duc Dat Pham, Tan Duy Le, Kha Tu Huynh

Published: 2025, Last Modified: 19 Mar 2026IUKM (1) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Accurate classification of skin lesions is crucial for the early detection and treatment of skin cancer. This study introduces a deep ensemble framework that leverages multimodal data, combining images captured by smartphones’ cameras and patient metadata. The ensemble comprises three distinct convolutional neural network (CNN) architectures integrated with efficient attention mechanisms. Evaluated on the PAD-UFES-20 dataset across six diagnostic classes representing six types of skin lesions, the proposed framework achieved an overall accuracy of 84.35%, outperforming each individual model and their baseline benchmarks. The ensemble demonstrated enhanced precision and recall across most classes, particularly in complex and underrepresented categories, highlighting its effectiveness in mitigating individual model limitations. These findings underscore the potential of deep ensemble strategies in medical image classification, offering a promising advancement for clinical applications in dermatology and improving the reliability of skin cancer diagnostics.

External IDs:dblp:conf/iukm/PhamPLH25