Rather a Nurse than a Physician - Contrastive Explanations under Investigation

Oliver Eberle; Ilias Chalkidis; Laura Cabello; Stephanie Brandl

Rather a Nurse than a Physician - Contrastive Explanations under Investigation

Oliver Eberle, Ilias Chalkidis, Laura Cabello, Stephanie Brandl

Published: 07 Oct 2023, Last Modified: 01 Dec 2023EMNLP 2023 MainEveryoneRevisionsBibTeX

Submission Type: Regular Long Paper

Submission Track: Interpretability, Interactivity, and Analysis of Models for NLP

Keywords: explainability, contrastive explanations, human annotations

TL;DR: We compare contrastive/non-contrastive model-based explanations with contrastive/non-contrastive human rationales that we collected for this purpose.

Abstract: Contrastive explanations, where one decision is explained *in contrast to another*, are supposed to be closer to how humans explain a decision than non-contrastive explanations, where the decision is not necessarily referenced to an alternative. This claim has never been empirically validated. We analyze four English text-classification datasets (SST2, DynaSent, BIOS and DBpedia-Animals). We fine-tune and extract explanations from three different models (RoBERTa, GTP-2, and T5), each in three different sizes and apply three post-hoc explainability methods (LRP, GradientxInput, GradNorm). We furthermore collect and release human rationale annotations for a subset of 100 samples from the BIOS dataset for contrastive and non-contrastive settings. A cross-comparison between model-based rationales and human annotations, both in contrastive and non-contrastive settings, yields a high agreement between the two settings for models as well as for humans. Moreover, model-based explanations computed in both settings align equally well with human rationales. Thus, we empirically find that humans do not necessarily explain in a contrastive manner.

Submission Number: 3370

Loading