Sparse Autoencoders are Rank-1 Mixture-of-Experts: A Low-Rank Architectural Duality

Sparse Autoencoders are Rank-1 Mixture-of-Experts: A Low-Rank Architectural Duality

08 May 2026 (modified: 09 May 2026)ICML 2026 Workshop CoLoRAI SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: sparse autoencoders, mixture of experts, low-rank decomposition

TL;DR: TopK SAEs and rank-1 MoEs are the same architecture. Per-expert PC1 of W_down beats untrained matched-architecture 20–23× on DeepSeek + OLMoE; Switch's load-balancing loss cuts SAE dead features 6.2–18.6× over SAE baselines.

Abstract: The sparse autoencoder (SAE), the dominant tool for mechanistic interpretability of large language models, and the Mixture-of-Experts (MoE) layer, the dominant tool for scaling them, are presented as different architectures in their respective literatures. We prove they are the same low-rank decomposition. A TopK SAE with unit-norm decoder columns is functionally identical to a parameter-shared MoE in which each expert is a rank-1 linear map E_i = d_i e_i^T (an outer product of a router row and a decoder atom), with TopK token-choice routing selecting which k rank-1 components contribute per token. The MoE forward pass is then the order-3 tensor contraction y = Σ g_i(x) d_i (e_i · x) — a sparse CP-style sum of k active rank-1 outer products from a dictionary of E. Three controlled gaps quantify the distance to a production MoE (FFN-expert rank, joint vs. post-hoc training, prediction-vs-reconstruction objective), each a deviation from the strict rank-1 identification. The duality has direct empirical force in both directions. First, MoE training tools port mechanically to SAE training: treating SAE encoder pre-activations as routing logits and applying the Switch auxiliary load-balancing loss reduces dead features by 6.2–18.6× over the SAE community's tuned mitigations on a synthetic ground-truth benchmark, is Pareto-dominant for SAE sparsity k_SAE ≥ 4, and has a triple-pinned causal mechanism. At LM scale (Pythia-160M, 3 seeds), Switch is the unique method that caps the maximum-active firing rate at ~60% (vs. ~97% baseline / ghost) — the duality-predicted load-balancing signature on the rank-1 dictionary. Second, the duality is realised in production MoE training optima: the rank-1 implicit SAE built from a fine-grained MoE's router rows and per-expert PC1 of W_down (a Tucker-style mode-rank-1 truncation) reconstructs activations **20.2–23.5×** better than the same architecture with random weights, across DeepSeek-MoE-16B and OLMoE-1B-7B at three layers each — a clean isolation of training (not architecture) as the source of the rank-1 structure. SAEs and MoEs are operating points on the same rank-1-dictionary family.

Submission Number: 107

Loading