Keywords: single-cell foundation models, sparse autoencoders, interpretability, steering
TL;DR: We use sparse autoencoders to interpret single-cell foundation models, revealing that they capture diverse biological signals but fragment cell type information, and demonstrate that targeted feature interventions can improve batch integration.
Abstract: Single-cell foundation models (scFMs) hold promise for applications in cell type annotation and data integration, but their internal mechanisms remain poorly understood. We investigate the structure of these models by training sparse autoencoders (SAEs) on the hidden representations of two widely used scFMs, scGPT and scFoundation.
The learned features reveal diverse and complex biological and technical signals, which emerge even in pre-trained models.
We also observe that the encoding of this information differs between scFMs with distinct training protocols and architectures. Further, we find that while many features capture the information about cell types across several studies, they often fall short of unifying it into a single generalized representation. Finally, by intervening on SAE features, we can reduce unwanted technical effects while steering model outputs to preserve the core biological signal. These findings provide a path toward more interpretable and controllable single-cell foundation models.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 14062
Loading