Dissecting and Steering Cell Identity in a Single-Cell Foundation Model Using Sparse Autoencoders

Published: 02 Mar 2026, Last Modified: 08 May 2026MLGenX 2026 TinypapertrackEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Single-cell Foundation Models (scFMs) have demonstrated remarkable capability in learning cellular representations, yet their internal mechanisms remain largely opaque. In this work, we apply Sparse Autoencoders (SAEs) to the residual stream of AIDO.Cell, a transformer-based scFM, to decompose its latent space into interpretable biological features. We trained TopK SAEs on the 12th transformer layer using the PBMC3K dataset as a proof of concept. Using Gene Ontology enrichment to interpret features, we find that ∼ 64% of trained SAE features achieve statistically significant biological annotations, compared to ∼ 34% from the dense raw activations. Beyond interpretation, we demonstrate that these features can be used to functionally ”steer” cell identity: amplifying and suppressing individual features (e.g., Viral Defense) drives symmetric changes in gene expression, with steered cell states aligning with the expected biological programs. Furthermore, we implement a contrastive steering method to automatically discover sparse feature combinations that drive CD4+ T cells towards a CD8+ T cell phenotype. Inspection of the selected features revealed that the model learned biologically relevant directions in latent space enabling cell-type steering. Our findings show that a scFM can learn a decomposable and manipulable model of cell biology, enabling interpretable in silico experiments.
Submission Number: 56
Loading