DeepDecipher: Accessing and Investigating Neuron Activation in Large Language Models

Published: 27 Oct 2023, Last Modified: 21 Nov 2023NeurIPS XAIA 2023EveryoneRevisionsBibTeX
TL;DR: A tool for LLM interpretability
Abstract: As large language models (LLMs) become more capable, there is an urgent need for interpretable and transparent tools. Current methods are difficult to implement, and accessible tools to analyze model internals are lacking. To bridge this gap, we present DeepDecipher - an API and interface for probing neurons in transformer models' MLP layers. DeepDecipher makes the outputs of advanced interpretability techniques readily available for LLMs. The easy-to-use interface also makes inspecting these complex models more intuitive. This paper outlines DeepDecipher's design and capabilities. We demonstrate how to analyze neurons, compare models, and gain insights into model behavior. For example, we contrast DeepDecipher's functionality with similar tools like Neuroscope and OpenAI's Neuron Explainer. DeepDecipher enables efficient, scalable analysis of LLMs. By granting access to state-of-the-art interpretability methods, DeepDecipher makes LLMs more transparent, trustworthy, and safe. Researchers, engineers, and developers can quickly diagnose issues, audit systems, and advance the field.
Submission Track: Demo Track
Application Domain: Natural Language Processing
Survey Question 1: Some methods have been developed to aid in interpreting the MLP neurons of large language models, but they can be time-consuming to use and unwieldy. DeepDecipher aims to solve this problem by providing easy access to the results of these methods on as many methods as possible in as useful a format as possible.
Survey Question 2: Supporting interpretability research in LLM's was the entire point of the work, so there was never any question.
Survey Question 3: We currently provide data from Neuroscope, Neuron2Graph and Neuron Explainer, but plan to add more.
Submission Number: 36