Abstract: Single-cell Foundation Models (scFMs) have demonstrated remarkable capability
in learning cellular representations, yet their internal mechanisms remain largely
opaque. In this work, we apply Sparse Autoencoders (SAEs) to the residual stream
of AIDO.Cell, a transformer-based scFM, to decompose its latent space into interpretable biological features. We trained TopK SAEs on the 12th transformer
layer using the PBMC3K dataset as a proof of concept. Using Gene Ontology enrichment to interpret features, we find that ∼ 64% of trained SAE features achieve
statistically significant biological annotations, compared to ∼ 34% from the dense
raw activations. Beyond interpretation, we demonstrate that these features can be
used to functionally ”steer” cell identity: amplifying and suppressing individual
features (e.g., Viral Defense) drives symmetric changes in gene expression, with
steered cell states aligning with the expected biological programs. Furthermore,
we implement a contrastive steering method to automatically discover sparse feature combinations that drive CD4+ T cells towards a CD8+ T cell phenotype.
Inspection of the selected features revealed that the model learned biologically
relevant directions in latent space enabling cell-type steering. Our findings show
that a scFM can learn a decomposable and manipulable model of cell biology,
enabling interpretable in silico experiments.
Submission Number: 56
Loading