Explainable Concept Generation through Vision-Language Preference Learning for Understanding Neural Networks' Internal Representations
TL;DR: A method to automatically articulate concepts to explain neural networks.
Abstract: Understanding the inner representation of a neural network helps users improve models. Concept-based methods have become a popular choice for explaining deep neural networks post-hoc because, unlike most other explainable AI techniques, they can be used to test high-level visual "concepts" that are not directly related to feature attributes. For instance, the concept of "stripes" is important to classify an image as a zebra. Concept-based explanation methods, however, require practitioners to guess and manually collect multiple candidate concept image sets, making the process labor-intensive and prone to overlooking important concepts. Addressing this limitation, in this paper, we frame concept image set creation as an image generation problem. However, since naively using a standard generative model does not result in meaningful concepts, we devise a reinforcement learning-based preference optimization (RLPO) algorithm that fine-tunes a vision-language generative model from approximate textual descriptions of concepts. Through a series of experiments, we demonstrate our method's ability to efficiently and reliably articulate diverse concepts that are otherwise challenging to craft manually.
Lay Summary: Have you ever wondered how an AI "thinks" when it makes a decision? For instance, if you show a neural network a picture, how does it know whether it's a zebra or not? It's often looking for specific visual patterns or what we call "concepts", like the black and white stripes on a zebra. Understanding these concepts helps us figure out why an AI behaves the way it does, which is really important for making AI more reliable and trustworthy.
The challenge is that identifying these key concepts can be tricky and time-consuming. You'd usually have to guess what concepts are important and then manually gather lots of images to represent them and then verify it. We've developed a new method that automates this process. Our approach uses artificial intelligence to automatically identify and generate images that represent the important concepts a neural network is looking at. This makes it much easier to understand how these complex AI systems work, without all the manual effort!
Link To Code: https://github.com/aditya-taparia/RLPO
Primary Area: Deep Learning->Generative Models and Autoencoders
Keywords: Concept based Explainable AI, TCAV, Vision-Language Models, Reinforcement Learning
Submission Number: 8185
Loading