Transcoder-based Circuit Analysis for Interpretable single-Cell Foundation Models

Published: 11 Nov 2025, Last Modified: 23 Dec 2025XAI4Science Workshop 2026EveryoneRevisionsBibTeXCC BY 4.0
Track: Regular Track (Page limit: 6-8 pages)
Keywords: Single-cell analysis, mechanistic interpretability, transcoder, circuit analysis
TL;DR: We use transcoders to extract sparse circuits from C2S, enabling mechanistic, biologically grounded interpretations of its internal decision-making process.
Abstract: Single-cell foundation models (scFMs) have demonstrated state-of-the-art performance on various tasks, such as cell-type annotation and perturbation response prediction, by learning gene regulatory networks from large-scale transcriptome data. However, a significant challenge remains: the decision-making processes of these models are less interpretable compared to traditional methods like differential gene expression analysis. Recently, transcoders have emerged as a promising approach for extracting interpretable decision circuits from large language models (LLMs). In this work, we train transcoders on all 24 layers of the cell2sentence (C2S) model, a state-of-the-art scFM, and develop systematic pipelines for biological interpretation. Our analysis reveals that over 80\% of transcoder features across most layers are biologically interpretable through Gene Set Enrichment Analysis (GSEA). Through a case study on endothelial cell classification, we demonstrate that extracted circuits correctly identify cell-type-specific genes and significantly enrich for relevant pathways (FDR = 0.0013), confirming that transcoders can identify internal features aligned with biological knowledge within complex single-cell models.
Submission Number: 47
Loading