Keywords: Explainability, Vision Language Models, Face Recognition, X-ray Diagnosis
TL;DR: Framework which enables the deep learning models to provide human expert like explanations
Abstract: In mission-critical domains such as law enforcement and medical diagnosis, the
ability to explain and interpret the outputs of deep learning models is crucial for en-
suring user trust and supporting informed decision-making. Despite advancements
in explainability, existing methods often fall short in providing explanations that
mirror the depth and clarity of those given by human experts. Such expert-level
explanations are essential for the dependable application of deep learning models
in law enforcement and medical contexts. Additionally, we recognize that most
explanations in real-world scenarios are communicated primarily through natu-
ral language. Addressing these needs, we propose a novel approach that utilizes
characteristic descriptors to explain model decisions by identifying their presence
in images, thereby generating expert-like explanations. Our method incorporates
a concept bottleneck layer within the model architecture, which calculates the
similarity between image and descriptor encodings to deliver inherent and faithful
explanations. Through experiments in face recognition and chest X-ray diagnosis,
we show that our approach offers a significant contrast over existing techniques,
which are often limited to the use of saliency maps. Our approach represents a sig-
nificant step toward making deep learning systems more accountable, transparent,
and trustworthy in the critical domains of face recognition and medical diagnosis.
Track: Main track
Submitted Paper: No
Published Paper: No
Submission Number: 61
Loading