Explaining Latent Representations of Neural Networks with Archetypal Analysis

Published: 05 Nov 2025, Last Modified: 09 Dec 2025NLDL 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Archetypal Analysis, Explainability, Vision Transformers
TL;DR: We use Archetypal Analysis to interpret neural network latent spaces, revealing how models encode and transform information across layers
Abstract: We apply Archetypal Analysis to the latent spaces of trained neural networks, offering interpretable explanations of feature representations of neural networks without relying on user-defined corpora. Through layer-wise analyses of convolutional networks and vision transformers across multiple classification tasks, we demonstrate that archetypes are robust, dataset-independent, and provide intuitive insights into how models encode and transform information from layer to layer. Our approach enables global insights by characterizing the unique structure of the latent representation space of each layer, while also offering localized explanations of individual decisions as convex combinations of extreme points (i.e., archetypes).
Git: https://github.com/Wedenborg/Explaining-Latent-Representations-of-Neural-Networks-with-Archetypal-Analysis
Serve As Reviewer: ~Anna_Emilie_Jennow_Wedenborg1
Submission Number: 20
Loading