Explaining black box text modules in natural language with language models

Chandan Singh; Aliyah Hsu; Richard Antonello; Shailee Jain; Alexander Huth; Bin Yu; Jianfeng Gao

Explaining black box text modules in natural language with language models

Chandan Singh, Aliyah Hsu, Richard Antonello, Shailee Jain, Alexander Huth, Bin Yu, Jianfeng Gao

Published: 27 Oct 2023, Last Modified: 06 Nov 2023NeurIPS XAIA 2023EveryoneRevisionsBibTeX

Abstract: Large language models (LLMs) have demonstrated remarkable prediction performance for a growing array of tasks. However, their rapid proliferation and increasing opaqueness have created a growing need for interpretability. Here, we ask whether we can automatically obtain natural language explanations for black box text modules. A *text module* is any function that maps text to a scalar continuous value, such as a submodule within an LLM or a fitted model of a brain region. *Black box* indicates that we only have access to the module's inputs. We introduce Summarize and Score (SASC), a method that takes in a text module and returns a natural language explanation of the module's selectivity along with a score for how reliable the explanation. We study SASC in 2 contexts. First, we evaluate SASC on synthetic modules and find that it often recovers ground truth explanations. Second, we use SASC to explain modules found within a pre-trained BERT model, enabling inspection of the model's internals.

Submission Track: Full Paper Track

Application Domain: Natural Language Processing

Survey Question 1: Large language models, such as ChatGPT, consist of a large number of modules, which are difficult to interpret efficiently. We propose a method, called SASC, that helps to automatically explain the function of a module with a short natural-language description.

Survey Question 2: Existing approaches for automatically explaining text modules are limited, often requiring a great deal of human effort in sifting through text inputs and module outputs to guess what a module is doing. Our approach helps to automate this process.

Survey Question 3: We use large language models themselves to generate and evaluate explanations.

Submission Number: 16

Loading