DORA: Exploring Outlier Representations in Deep Neural NetworksDownload PDF

Published: 04 Mar 2023, Last Modified: 21 Apr 2024ICLR 2023 Workshop on Trustworthy ML PosterReaders: Everyone
Keywords: Explainable AI, Machine Learning
TL;DR: In this paper, we introduce DORA: the first automatic data-agnostic method for the detection of outlier representations in Deep Neural Networks.
Abstract: Although Deep Neural Networks (DNNs) are incredibly effective in learning complex abstractions, they are susceptible to unintentionally learning spurious artifacts from the training data. To ensure model transparency, it is crucial to examine the relationships between learned representations, as unintended concepts often manifest themselves to be anomalous to the desired task. In this work, we introduce DORA (Data-agnOstic Representation Analysis): the first \textit{data-agnostic} framework for the analysis of the representation space of DNNs. Our framework employs the proposed \textit{Extreme-Activation} (EA) distance measure between representations that utilizes self-explaining capabilities within the network itself without accessing any data. We quantitatively validate the metric's correctness and alignment with human-defined semantic distances. The coherence between the EA distance and human judgment enables us to identify representations whose underlying concepts would be considered unnatural by humans by identifying outliers in functional distance. Finally, we demonstrate the practical usefulness of DORA by analyzing and identifying artifact representations in popular Computer Vision models.
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 7 code implementations](https://www.catalyzex.com/paper/arxiv:2206.04530/code)
0 Replies

Loading