Keywords: representational alignment, superposition, theory, neural geometry, sparse autoencoder, universality, linear regression, disentanglement
TL;DR: Neural networks often appear misaligned not because they learn different things, but because their neurons represent different mixtures of the exact same underlying features.
Abstract: Neural networks trained on the same tasks achieve similar performance, but this is
not always reflected in their measured representational alignment. We propose that
this discrepancy arises from superposition or mixed selectivity, where individual
neurons represent mixtures of features. Consequently, two networks representing
an identical set of features can appear dissimilar if their neurons mix those fea-
tures differently. This may explain why higher-dimensional networks, which are
less prone to compressing mixtures of features, often show better alignment than
smaller models with greater behavioral similarity. We formalize this through an
analytic theory predicting apparent misalignment for common linear metrics like
Representational Similarity Analysis (RSA) and Linear Regression, validating it
from random projections to real neural networks. Using sparse autoencoders and
K-Means to extract disentangled features while controlling for dimensionality, we
find that feature-based alignment reveals higher similarity, particularly for early
and lower-dimensional regions. Some comparisons show decreased alignment
with disentanglement, and RSA and Linear Regression often disagree in these
cases. Simulations predict that higher RSA relative to Linear Regression in neu-
ral space indicates shared inductive biases—a pattern confirmed in real data. Our
results demonstrate that superposition and dimensionality interactions obscure the
true alignment of lower-dimensional systems, while feature-based alignment al-
lows us to more directly interrogate performance-relevant sources of misalign-
ment, with important implications for model selection.
Primary Area: applications to neuroscience & cognitive science
Submission Number: 21678
Loading