Features Emerge as Discrete States: The First Application of SAEs to 3D Representations

Published: 26 Jan 2026, Last Modified: 11 Apr 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: sparse autoencoders, mechanistic interpretability, computer vision
TL;DR: We present the first application of SAEs to the 3D domain, framing model internals as a state-based feature space governed by phase transitions.
Abstract: Sparse Autoencoders (SAEs) have found human-interpretable features in LLM activations, clarifying how LLMs transform input to output. However, they have rarely been applied outside of text, limiting explorations of feature dynamics. We present the first application of SAEs to the 3D domain, analyzing the features found in 53k 3D objects encoded by a state-of-the-art 3D reconstruction VAE. We observe that the model encodes discrete rather than continuous features, leading to our key finding: the model's feature activations approximate a discrete state space, driven by phase-like transitions. Through this state space framework, we address three otherwise unintuitive behaviors — the preference for positional encoding features, the sigmoidal relationship between feature ablation and reconstruction loss, and the bimodal distribution of phase transition points. This final observation suggests the model redistributes superposition interference to prioritize the high-importance features. Our work not only catalogs and explains unexpected feature dynamics, but also provides a framework to explain the model's learning dynamics. The code is available at https://feature3d.github.io/Dora-SAE/.
Supplementary Material: zip
Primary Area: interpretability and explainable AI
Submission Number: 9908
Loading