Track: Regular Track (Page limit: 6-8 pages)
Keywords: Single-cell analysis, mechanistic interpretability, transcoder, circuit analysis
TL;DR: We use transcoders to extract sparse circuits from C2S, enabling mechanistic, biologically grounded interpretations of its internal decision-making process.
Abstract: Single-cell foundation models (scFMs) have demonstrated
state-of-the-art performance on various tasks, such as cell-type
annotation and perturbation response prediction, by learning gene
regulatory networks from large-scale transcriptome data. However, a
significant challenge remains: the decision-making processes of
these models are less interpretable compared to traditional methods
like differential gene expression analysis. Recently, transcoders
have emerged as a promising approach for extracting interpretable
decision circuits from large language models (LLMs). In this work,
we train transcoders on all 24 layers of the cell2sentence (C2S)
model, a state-of-the-art scFM, and develop systematic pipelines for
biological interpretation. Our analysis reveals that over 80\% of
transcoder features across most layers are biologically
interpretable through Gene Set Enrichment Analysis (GSEA). Through a
case study on endothelial cell classification, we demonstrate that
extracted circuits correctly identify cell-type-specific genes and
significantly enrich for relevant pathways (FDR = 0.0013), confirming
that transcoders can identify internal features aligned with biological
knowledge within complex single-cell models.
Submission Number: 47
Loading