Learned Visual Features to Textual Explanations

Saeid Asgari; Aliasghar Khani; Amir Hosein Khasahmadi; Ali Saheb Pasand; Aditya Sanghi; Karl D.D. Willis; Ali Mahdavi Amiri

Learned Visual Features to Textual Explanations

Saeid Asgari, Aliasghar Khani, Amir Hosein Khasahmadi, Ali Saheb Pasand, Aditya Sanghi, Karl D.D. Willis, Ali Mahdavi Amiri

20 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Keywords: explainability, reliability

TL;DR: We propose a novel method that leverages the capabilities of large language models (LLMs) to interpret the *learned features* of pre-trained image classifiers

Abstract: Interpreting the learned features of vision models has posed a longstanding challenge in the field of machine learning. To address this issue, we propose a novel method that leverages the capabilities of large language models (LLMs) to interpret the *learned features* of pre-trained image classifiers. Our method, called TExplain, tackles this task by training a neural network to establish a connection between the feature space of image classifiers and LLMs. Then, during inference, our approach generates a vast number of sentences to explain the features learned by the classifier for a given image. These sentences are then used to extract the most frequent words, providing a comprehensive understanding of the learned features and patterns within the classifier. Our method, for the first time, utilizes these frequent words corresponding to a visual representation to provide insights into the decision-making process of the independently trained classifier, enabling the detection of spurious correlations, biases, and a deeper comprehension of its behavior. To validate the effectiveness of our approach, we conduct experiments on diverse datasets, including ImageNet-9L and Waterbirds. The results demonstrate the potential of our method to enhance the interpretability and robustness of image classifiers.

Supplementary Material: zip

Primary Area: visualization or interpretation of learned representations

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 2875

Loading