Finding Manifolds With Bilinear Autoencoders

Published: 30 Sept 2025, Last Modified: 13 Oct 2025Mech Interp Workshop (NeurIPS 2025) SpotlightEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Foundational work, Sparse Autoencoders
Other Keywords: Tensor Networks, Compositionality,
TL;DR: Decomposing representations into polynomial latents using bilinear autoencoders.
Abstract: Sparse autoencoders are a standard tool for uncovering interpretable latent representations in neural networks. Yet, their interpretation depends on the inputs, making their isolated study incomplete. Polynomials offer a solution; they serve as algebraic primitives that can be analysed without reference to input and can describe structures ranging from linear concepts to complicated manifolds. This work uses bilinear autoencoders to efficiently decompose representations into quadratic polynomials. We discuss improvements that induce importance ordering, clustering, and activation sparsity. This is an initial step toward nonlinear yet analysable latents through their algebraic properties.
Submission Number: 146
Loading