Explainable Concept Generation through Vision-Language Preference Learning for Understanding Neural Networks' Internal Representations

Aditya Taparia; Som Sagar; Ransalu Senanayake

Explainable Concept Generation through Vision-Language Preference Learning for Understanding Neural Networks' Internal Representations

Aditya Taparia, Som Sagar, Ransalu Senanayake

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: A method to automatically articulate concepts to explain neural networks.

Abstract: Understanding the inner representation of a neural network helps users improve models. Concept-based methods have become a popular choice for explaining deep neural networks post-hoc because, unlike most other explainable AI techniques, they can be used to test high-level visual "concepts" that are not directly related to feature attributes. For instance, the concept of "stripes" is important to classify an image as a zebra. Concept-based explanation methods, however, require practitioners to guess and manually collect multiple candidate concept image sets, making the process labor-intensive and prone to overlooking important concepts. Addressing this limitation, in this paper, we frame concept image set creation as an image generation problem. However, since naively using a standard generative model does not result in meaningful concepts, we devise a reinforcement learning-based preference optimization (RLPO) algorithm that fine-tunes a vision-language generative model from approximate textual descriptions of concepts. Through a series of experiments, we demonstrate our method's ability to efficiently and reliably articulate diverse concepts that are otherwise challenging to craft manually.

Lay Summary: Have you ever wondered how an AI "thinks" when it makes a decision? For instance, if you show a neural network a picture, how does it know whether it's a zebra or not? It's often looking for specific visual patterns or what we call "concepts", like the black and white stripes on a zebra. Understanding these concepts helps us figure out why an AI behaves the way it does, which is really important for making AI more reliable and trustworthy. The challenge is that identifying these key concepts can be tricky and time-consuming. You'd usually have to guess what concepts are important and then manually gather lots of images to represent them and then verify it. We've developed a new method that automates this process. Our approach uses artificial intelligence to automatically identify and generate images that represent the important concepts a neural network is looking at. This makes it much easier to understand how these complex AI systems work, without all the manual effort!

Link To Code: https://github.com/aditya-taparia/RLPO

Primary Area: Deep Learning->Generative Models and Autoencoders

Keywords: Concept based Explainable AI, TCAV, Vision-Language Models, Reinforcement Learning

Submission Number: 8185

Loading