Decoding Alignment without Encoding Alignment: A Missing Component for Interpretability

Johannes Bertram; Luciano Dyballa; T. Anderson Keller; Savik Kinger; Steven W. Zucker

Decoding Alignment without Encoding Alignment: A Missing Component for Interpretability

Johannes Bertram, Luciano Dyballa, T. Anderson Keller, Savik Kinger, Steven W. Zucker

Published: 11 Jun 2026, Last Modified: 23 Jun 2026Mech Interp Workshop ICML 2026 VirtualposterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Circuit Analysis, Attribution Graphs, Feature Geometry, Applications of interpretability

Other Keywords: Neuroscience

TL;DR: Representational alignment metrics measure behavior of neural populations, but are blind to the exact function producing this behavior. We propose encoding manifolds and GW distance as complimentary tools for analyzing functional alignment

Abstract: RSA and CKA are standard metrics for comparing neural representations across brain regions, organisms, and deep learning models. We demonstrate a fundamental weakness: these decoding-based metrics are insensitive to encoding manifold topology — the internal functional organization of a neural population. In a controlled MNIST experiment, RSA, CKA, and Procrustes $R^2$ remain statistically unchanged when encoding topology is causally manipulated via an auxiliary clustering loss, while the two model populations differ significantly in attribution patterns, weight-graph assortativity, and out-of-distribution robustness. Across biological systems and machine learning models, similar decoding behavior can arise from small, non-representative subpopulations, and alignment metrics are insensitive to encoding manifold topology even when it is fundamentally altered. These findings bear directly on mechanistic interpretability: standard alignment metrics cannot distinguish whether two networks share the same computational circuits or merely produce indistinguishable aggregate outputs. We propose encoding manifolds and Gromov–Wasserstein distance as complementary diagnostics for any decoding-based similarity claim, and provide a Neural Manifold Explorer tool.

Submission Number: 330

Loading