Timescale Separation in Sparse Dictionary Learning: Reconstruction Converges Before Reproducibility

Published: 29 May 2026, Last Modified: 29 May 2026HiLD at ICML 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: training dynamics, timescale separation, sparse autoencoders, dictionary learning, high-dimensional geometry, extreme value theory, null calibration, mean maximum cosine similarity, overcomplete representations, feature reproducibility, order statistics, mechanistic interpretability
TL;DR: SAE reconstruction loss saturates by epoch 20 while decoder reproducibility remains at random for 1,000+ epochs (asymptoting at 2.44× random) — a timescale separation predicted by an EVT random-dictionary null.
Abstract: Sparse autoencoders (SAEs) trained on neural network activations reach low reconstruction loss within tens of epochs, yet cross-seed decoder reproducibility—measured by mean maximum cosine similarity (MMCS) calibrated against an extreme-value-theory random-dictionary null—remains indistinguishable from random dictionaries on the same timescale. We document this timescale separation in a mod-113 algorithmic transformer ($d=128$) and validate it across Pythia-70M, 160M, and 410M ($d=512$–$1024$). On mod-113, reconstruction loss saturates by epoch 20 while TopK decoder MMCS continues climbing for 1,000+ epochs, asymptoting at 0.73 (2.44× random). On Pythia-410M layer 21, a $d_\text{sae}$ sweep shows that all overparameterization ratios ($d_\text{sae}/\text{erank}$ from 1.3 to 10.0) converge to 4.3–4.7× random given sufficient training, dissolving the apparent $d_\text{sae}/\text{erank}$ stability threshold. We derive this null from the order statistics of inner products between random unit vectors, obtaining predictions within 0.03% of empirical baselines across dimensions 128–1024. Dead-neuron resampling produces a transient MMCS spike to 0.888 at epoch 10 that decays below the unregularized baseline—a phantom consistency artifact arising from training dynamics rather than learned structure.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 68
Loading