Abstract: WWW is a method that computes the similarity between image and text features using CLIP and assigns a concept to each neuron of the target model whose behavior is to be determined. However, because this method calculates similarity using center crop for images, it may include features that are not related to the original class of the image and may not correctly reflect the similarity between the image and text. Additionally, WWW uses cosine similarity to calculate the similarity between images and text. Cosine similarity can sometimes result in a broad similarity distribution, which may not accurately capture the similarity between vectors. To address them, we propose a method that leverages Grad-CAM to crop the model’s attention region, filtering out the features unrelated to the original characteristics of the image. By using t-vMF to measure the similarity between the image and text, we achieved a more accurate discovery of neuron concepts.
Loading