LMExplainer: A Knowledge-Enhanced Explainer for Language Models

Zichen Chen; Jianda Chen; Chen YuanYuan; Han Yu; Ambuj Singh; Misha Sra

LMExplainer: A Knowledge-Enhanced Explainer for Language Models

Zichen Chen, Jianda Chen, Chen YuanYuan, Han Yu, Ambuj Singh, Misha Sra

24 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Keywords: Explainability, XAI, Language Model

Abstract: Language models (LMs), such as GPT-4, are powerful tools for natural language processing, capable of handling diverse tasks from text generation to question answering. However, their decision process lack transparency due to the complex, multi-layered, and nonlinear model structures involving millions of parameters. This hinders user trust on LMs, especially in safety-critical applications. Due to the opaque nature of LMs, a promising approach for explaining how they work is by generating explanations on a more transparent surrogate (e.g., a knowledge graph (KG)). Such works mostly exploit attention weights to provide explanations for LM recommendations. However, pure attention-based explanations lack scalability to keep up with the growing complexity of LMs. To bridge this important gap, we propose LMExplainer, a knowledge-enhanced explainer for LMs capable of providing human-understandable explanations. It is designed to efficiently locate the most relevant knowledge within a large-scale KG via the graph attention neural network (GAT) to extract key decision signals reflecting how a given LM works. Extensive experiments comparing LMExplainer against seven state-of-the-art baselines show that it outperforms existing LM+KG methods on the CommonsenseQA and OpenBookQA datasets. We compare the explanation generated by LMExplainer with other algorithm-generated explanations as well as human-annotated explanations. The results show that LMExplainer generates more comprehensive and clearer explanations.

Primary Area: visualization or interpretation of learned representations

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 9174

Loading