Keywords: world models, representation learning, sparse autoencoders
TL;DR: We replace dense visual features with sparse, interpretable codes so world models can plan faster and more robustly without compromising planning performance.
Abstract: World models promise efficient prediction, imagination, and planning by operating in a compact latent space, yet prevailing approaches inherit \emph{dense, entangled} visual features from large pretrained encoders. Such latents conflate unrelated factors and contain redundant dimensions, undermining intervention fidelity, inflating planning cost, and reducing robustness to distribution shifts. We propose \textbf{Sparse World Models (SWMs)}, which learn and plan \emph{entirely in a sparse feature space}. SWMs obtain selectively active codes by training a sparse autoencoder (SAE) to translate dense vision embeddings into an overcomplete but \emph{sparse} vocabulary, and then use these codes for state estimation, dynamics learning, and action optimization. By aligning units to meaningful factors, SWMs enable targeted interventions and attribution, and shrink the optimization search space. We further introduce an evaluation suite that probes feature capacity and links sparsity to planning outcomes. Across studies, sparse representations reduce polysemanticity and maintain planning performance while offering better efficiency and interpretability.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 21681
Loading