Geometry, Not Scale Alone, Predicts Sparse Recovery of Causal Subspaces

Published: 29 May 2026, Last Modified: 29 May 2026HiLD at ICML 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: high-dimensional learning, sparse recovery, representation geometry, sparse autoencoders, causal subspaces, dictionary learning, stable rank, decoder geometry, mechanistic interpretability, scaling
TL;DR: Sparse causal recovery is governed by decoder geometry, not scale alone: CSD shows when dense causal subspaces are recoverable in SAE bases and when random-K or sparse-limited failures dominate.
Abstract: Whether high-dimensional causal representations become easier to recover in sparse bases is not determined by model size alone. We study sparse decomposability: the extent to which a dense DAS-localized causal subspace can be expressed by a small set of pretrained SAE latents at a particular model-site-dictionary tuple. Causal sparse distillation (CSD) measures this property by matching a dense causal teacher with an SAE-constrained student. The central finding is geometric: two compact pre-CSD decoder statistics, the top-K decoder-subspace cosine with the teacher and the decoder stable rank, predict CSD/dense recovery across dense-valid Gemma/Qwen tuples with leave-one-out R2 = 0.89, leave-one-model-out R2 = 0.80, and bootstrap 95% CI [0.79, 0.95], while model size alone gives R2 = -2.00. This explains why some larger-model sites recover cleanly, others are random-K degenerate, and a 27B dense-valid site remains sparse-limited. A ground-truth synthetic battery and cross-family checks on Gemma, Llama, and Qwen-Scope calibrate the measurement: MCQA gives selector-specific positives at valid sites, while RAVEL shows SAE-coordinate recovery that matched random-K controls can render selector-uninformative. High CSD/dense recovery is therefore not by itself evidence of meaningful feature selection. The result is a high-dimensional learning story about how dictionary geometry, scale, and causal subspace orientation jointly control sparse recovery.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 65
Loading