Keywords: trustworthy AI, label errors, wrong labels, noisy labels, concept discovery, interpretability
TL;DR: We propose a novel analysis of how high-level concepts can be automatically identified at intermediate CNN layers, and then used to retrieve outlier images in the training dataset with wrong or confounding labels.
Abstract: Providing reliable and trustworthy predictions as the outcome of deep learning models is a major challenge, particularly in supervised settings that include misleading training annotations. Concept-based explanations clarify the relevance of high-level concepts to the model predictions, although this may be biased by the user expectations on the concepts. Here we propose a post-hoc unsupervised method that automatically discovers high-level concepts learned by intermediate layers of vision models.
By the singular value decomposition of the latent space of a layer, we discover concept vectors that correspond to orthogonal directions of high variance and that are relevant to the model prediction. Most of the identified concepts are human-understandable, coherent and relevant to the task. Moreover, by using the discovered concepts we identify training samples with confounding factors that emerge as outliers.
Our method is straightforward to implement, and it can be easily adapted to interpret multiple architectures and identify anomalies in the data collection.
0 Replies
Loading