LMExplainer: A Knowledge-Enhanced Explainer for Language Models

Anonymous

LMExplainer: A Knowledge-Enhanced Explainer for Language Models

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone

TL;DR: LMExplainer revolutionizes transparency in (large) LMs using a more transparent graph surrogate, providing superior, human-understandable explanations and outperforming existing models on major datasets.

Abstract: Language models (LMs) like GPT-4, are adept in tasks ranging from text generation to question answering. However, their decision process lacks of transparency due to complex model structures and millions of parameters. This hinders user trust on LMs, especially in safety-critical applications. Due to the opaque nature of LMs, a promising approach for explaining how they work is by generating explanations on a more transparent surrogate (e.g., a knowledge graph (KG)). Such works mostly exploit attention weights to provide explanations for LM recommendations. However, pure attention-based explanations lack scalability to keep up with the growing complexity of LMs. To bridge this important gap, we proposeLMExplainer, a knowledge-enhanced explainer for LMs capable of providing human-understandable explanations. It is designed to efficiently locate the most relevant knowledge within a large-scale KG via the graph attention neural network (GAT) to extract key decision signals reflecting how a given LM works. Extensive experiments comparingLMExplainer against eight state-of-the-art baselines show that it outperforms existing LM+KG methods and large LMs (LLMs) on the CommonsenseQA and OpenBookQA datasets. We compare the explanation generated byLMExplainer with other algorithm-generated explanations as well as human-annotated explanations. The results show thatLMExplainer generates more comprehensive and clearer explanations.

Paper Type: long

Research Area: Interpretability and Analysis of Models for NLP

Contribution Types: Model analysis & interpretability

Languages Studied: English

0 Replies

Loading