CLIP-Dissect: Automatic description of neuron representations in deep vision networks

Tuomas Oikarinen; Tsui-Wei Weng

CLIP-Dissect: Automatic description of neuron representations in deep vision networks

Tuomas Oikarinen, Tsui-Wei Weng

Published: 25 Mar 2022, Last Modified: 26 May 2025ICLR 2022 PAIR^2Struct PosterReaders: Everyone

Keywords: Interpretability, Explainability, Network Dissection

TL;DR: We propose an automated method for generating descriptions of the function of hidden layer neurons in deep vision networks, leveraging the multimodal CLIP-model.

Abstract: In this paper, we propose CLIP-Dissect, a new technique to automatically describe the function of individual hidden neurons inside vision networks. CLIP-Dissect leverages recent advances in multimodal vision/language models to label internal neurons with open-ended concepts without the need for any labeled data or human examples, which are required for existing tools to succeed. We show that CLIP-Dissect provides more accurate descriptions than existing methods for neurons where the ground-truth is available as well as qualitatively good descriptions for hidden layer neurons. In addition, our method is very flexible: it is model agnostic, can easily handle new concepts and can be extended to take advantage of better multimodal models in the future. Finally CLIP-Dissect is computationally efficient and labels all neurons of a layer in a large vision model in tens of minutes.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/clip-dissect-automatic-description-of-neuron/code)

0 Replies

Loading