Enhancing Trust in AI-Driven Dermatology: CLIP for Explainable Skin Lesion Diagnosis

Published: 09 Dec 2024, Last Modified: 15 Dec 2024AIM-FM Workshop @ NeurIPS'24 RejectEveryoneRevisionsBibTeXCC BY 4.0
Keywords: melanoma, multi-modal features, vision language model, explainability, CLIP
TL;DR: We fine-tune CLIP for lesion classifier using text and visual features and explore explainability of the classification decisions
Abstract: Skin carcinoma is the most common cancer worldwide and costs over $8 billion annually. Early diagnosis is vital for improving melanoma survival rates from 23% to 99%. Deep neural networks show promising results in classifying skin lesions as benign or malignant, but black-box methods are typically not trusted by doctors. In this paper we use the CLIP (Contrastive Language-Image Pretraining) model, trained on various skin lesion datasets, to capture meaningful relationships between visual features and related diagnostic terms in an effort to increase explainability. We also use a gradient-based visual explanation method for CLIP, known as Grad-ECLIP, which highlights the critical regions in images linked to specific diagnostic descriptions. This pipeline not only classifies skin lesions and generates corresponding descriptions but also adds a layer of visual explanations.
Submission Number: 58
Loading