Track: long paper (up to 8 pages)
Keywords: Sparse Autoencoders, Unsupervised Learning, Image Reconstruction, Interpretable Models, Computational Efficiency
TL;DR: We introduce Tensor-SAE, a structured sparse autoencoder that decodes through a learned bank of rank-1 tensor atoms (color × height × width).
Abstract: We introduce Tensor-SAE, a structured sparse autoencoder that decodes through a
learned bank of rank-1 tensor atoms (color × height × width). By factorizing the
decoder into separable color and spatial factors and applying a light sparsity prior
on latent activations, Tensor-SAE induces compact, interpretable representations
that enable linear, spatially localized, and semantically meaningful interventions in
image reconstructions. Unlike unconstrained dense or convolutional decoders that
distribute information diffusely, Tensor-SAE enforces a strong inductive bias that
trades some raw pixel-level fidelity for computational efficiency, interpretability,
and controllability. We evaluate Tensor-SAE on CIFAR-10 against two baselines
(a parameter-matched Dense-SAE and a ConvAE scaled to match parameter budgets). Our empirical suite (six figures) demonstrates that Tensor-SAE: (1) learns
low-entropy spatial atoms and clean color factors; (2) yields linearly predictable
intervention effects (R2 ≈ 0.93) enabling controllable color edits; (3) achieves
superior reconstruction efficiency per FLOP and per parameter; (4) produces consistently sparse latents; and (5) stabilizes intervention strength during training. We
discuss trade-offs, limitations, and the application of Tensor-SAE as a building
block for interpretable, compute-efficient generative systems.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 102
Loading