Are Large Language Models Post Hoc Explainers?

Nicholas Kroeger; Dan Ley; Satyapriya Krishna; Chirag Agarwal; Himabindu Lakkaraju

Are Large Language Models Post Hoc Explainers?

Nicholas Kroeger, Dan Ley, Satyapriya Krishna, Chirag Agarwal, Himabindu Lakkaraju

Published: 27 Oct 2023, Last Modified: 27 Oct 2023NeurIPS XAIA 2023EveryoneRevisionsBibTeX

Abstract: Large Language Models (LLMs) are increasingly used as powerful tools for a plethora of natural language processing (NLP) applications. A recent innovation, in-context learning (ICL), enables LLMs to learn new tasks by supplying a few examples in the prompt during inference time, thereby eliminating the need for model fine-tuning. While LLMs have been utilized in several applications, their applicability in explaining the behavior of other models remains relatively unexplored. Despite the growing number of new explanation techniques, many require white-box access to the model and/or are computationally expensive, highlighting a need for next-generation post hoc explainers. In this work, we present the first framework to study the effectiveness of LLMs in explaining other predictive models. More specifically, we propose a novel framework encompassing multiple prompting strategies: i) Perturbation-based ICL, ii) Prediction-based ICL, iii) Instruction-based ICL, and iv) Explanation-based ICL, with varying levels of information about the underlying ML model and the local neighborhood of the test sample. We conduct extensive experiments with real-world benchmark datasets to demonstrate that LLM-generated explanations perform on par with state-of-the-art post hoc explainers using their ability to leverage ICL examples and their internal knowledge in generating model explanations. On average, across four datasets and two ML models, we observe that LLMs identify the most important feature with 72.19% accuracy, opening up new frontiers in explainable artificial intelligence (XAI) to explore LLM-based explanation frameworks.

Submission Track: Full Paper Track

Application Domain: Natural Language Processing

Survey Question 1: In our study, we turned to the latest advancements in natural language processing, particularly Large Language Models (LLMs), to see if they could offer explanations for the predictions of other machine learning models. LLMs are known for their capability to understand and generate human-like text based on a few input examples. We designed various strategies to prompt these LLMs, feeding them information about a given prediction, and asking them to generate explanations. Our experiments on multiple datasets showed that LLMs could produce reliable and understandable explanations, often performing as well as or better than current state-of-the-art explanation techniques.

Survey Question 2: We opted for benchmark datasets like Recidivism and Credit, which have known challenges and implications around model interpretability. While powerful predictive models exist, many current explanation techniques necessitate intrusive white-box access or impose computational burdens. These constraints limit their applicability, especially when rapid or scalable explanations are required. Moreover, a lack of clarity on why models make specific predictions can inhibit their wider adoption, especially in sensitive applications. Our observation of these challenges in existing methods, combined with the versatility of LLMs, spurred our exploration into harnessing LLMs as more efficient and accessible explainers.

Survey Question 3: In our work, we introduce a novel approach to achieve explainability by leveraging the capabilities of Large Language Models (LLMs). Rather than relying solely on existing techniques, we designed a unique framework that encompasses multiple prompting strategies, such as Perturbation-based ICL, Prediction-based ICL, Instruction-based ICL, and Explanation-based ICL. To validate the effectiveness and reliability of our LLM-based explanation method, we conducted comprehensive tests against well-established XAI techniques like LIME, SHAP, Gradients, Integrated Gradients, SmoothGrad, and Input x Gradients. This comparative evaluation allowed us to gauge how our LLM-generated explanations stand relative to state-of-the-art explainers.

Submission Number: 76

Loading