Choosing the right basis for interpretability: Psychophysical comparison between neuron-based and \\ dictionary-based representations

TMLR Paper7383 Authors

06 Feb 2026 (modified: 05 Mar 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Interpretability research often adopts a neuron-centric lens, treating individual neurons as the fundamental units of explanation. However, neuron-level explanations can be undermined by superposition, where single units respond to mixtures of unrelated patterns. Dictionary learning methods, such as sparse autoencoders and non-negative matrix factorization, offer a promising alternative by learning a new basis over layer activations. Despite this promise, direct human evaluations comparing neuron-based and dictionary-based representations remain limited. We conducted three large-scale online psychophysics experiments (N=481) comparing explanations derived from neuron-based and dictionary-based representations in two convolutional neural networks (ResNet50, VGG16). We operationalize interpretability via visual coherence: a basis is more interpretable if humans can reliably recognize a common visual pattern in its maximally activating images and generalize that pattern to new images. Across experiments, dictionary-based representations were consistently more interpretable than neuron-based representations, with the advantage increasing in deeper layers. Critically, because models differ in how neuron-aligned their representations are---with ResNet50 exhibiting greater superposition, neuron-based evaluations can mask cross-model differences, such that ResNet50's higher interpretability emerges only under dictionary-based comparisons. These results provide psychophysical evidence that dictionary-based representations offer a stronger foundation for interpretability and caution against model comparisons based solely on neuron-level analyses.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Sinead_Williamson1
Submission Number: 7383
Loading