Explaining Latent Representations of Neural Networks with Archetypal Analysis

Anna Emilie Jennow Wedenborg; Teresa Dorszewski; Lars Kai Hansen; Kristoffer Knutsen Wickstrøm; Morten Mørup

Explaining Latent Representations of Neural Networks with Archetypal Analysis

Anna Emilie Jennow Wedenborg, Teresa Dorszewski, Lars Kai Hansen, Kristoffer Knutsen Wickstrøm, Morten Mørup

Published: 05 Nov 2025, Last Modified: 09 Dec 2025NLDL 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Archetypal Analysis, Explainability, Vision Transformers

TL;DR: We use Archetypal Analysis to interpret neural network latent spaces, revealing how models encode and transform information across layers

Abstract: We apply Archetypal Analysis to the latent spaces of trained neural networks, offering interpretable explanations of feature representations of neural networks without relying on user-defined corpora. Through layer-wise analyses of convolutional networks and vision transformers across multiple classification tasks, we demonstrate that archetypes are robust, dataset-independent, and provide intuitive insights into how models encode and transform information from layer to layer. Our approach enables global insights by characterizing the unique structure of the latent representation space of each layer, while also offering localized explanations of individual decisions as convex combinations of extreme points (i.e., archetypes).

Git: https://github.com/Wedenborg/Explaining-Latent-Representations-of-Neural-Networks-with-Archetypal-Analysis

Serve As Reviewer: ~Anna_Emilie_Jennow_Wedenborg1

Submission Number: 20

Loading