MaNtLE: Model-agnostic Natural Language Explainer

Rakesh R Menon; Kerem Zaman; Shashank Srivastava

MaNtLE: Model-agnostic Natural Language Explainer

Rakesh R Menon, Kerem Zaman, Shashank Srivastava

Published: 07 Oct 2023, Last Modified: 01 Dec 2023EMNLP 2023 MainEveryoneRevisionsBibTeX

Submission Type: Regular Long Paper

Submission Track: Interpretability, Interactivity, and Analysis of Models for NLP

Submission Track 2: Machine Learning for NLP

Keywords: explainable AI, interpretability

TL;DR: We propose MaNtLE a model-agnostic natural language explainer that analyzes multiple classifier predictions and generates faithful natural language explanations of classifier rationale for structured classification tasks.

Abstract: Understanding the internal reasoning behind the predictions of machine learning systems is increasingly vital, given their rising adoption and acceptance. While previous approaches, such as LIME generate algorithmic explanations by attributing importance to input features for individual examples, recent research indicates that practitioners prefer examining language explanations that explain sub-groups of examples (Lakkaraju et al., 2022). In this paper, we introduce MaNtLE, a model-agnostic natural language explainer that analyzes a set of classifier predictions and generates faithful natural language explanations of classifier rationale for structured classification tasks. MaNtLE uses multi-task training on thousands of synthetic classification tasks to generate faithful explanations. Our experiments indicate that, on average, MaNtLE-generated explanations are at least 11% more faithful compared to LIME and Anchors explanations across three tasks. Human evaluations demonstrate that users can better predict model behavior using explanations from MaNtLE compared to other techniques.

Submission Number: 2369

Loading