Single-Cell Concept Bottleneck Generative Models for Interpretable and Controllable Cellular Editing
Abstract: Understanding how any cellular phenotype changes in response to genetic and/or chemical interventions is fundamental to the understanding of cell biology and the design of effective and safe therapeutics. Single-cell RNA sequencing enables characterization at cellular resolution, yet the combinatorial space of possible interventions renders exhaustive experimental mapping infeasible. In-silico counterfactual generation is an attractive alternative but current scRNA-seq generative methods fall short: disentanglement models rarely support interventions, and most intervention-based approaches perform conditional generation that synthesizes new cells rather than editing existing ones.
We introduce Single-Cell Concept Bottleneck Generative Models (scCBGMs), unifying counterfactual reasoning and generative modeling, to overcome these limitations. \method~adapts concept bottleneck architectures for single-cell data through decoder skip connections and a cross-covariance penalty that promotes disentanglement without dimensional constraints. Counterfactual edits are done via a principled abduction–action–prediction procedure that supports fine-grained per-cell perturbations and zero-shot generalization to novel concept combinations. Moreover, conditioning modern generative procedures, e.g., flow matching, on scCBGM embeddings combines state-of-the-art single cell data generation quality with precise controllability.
Across multiple real datasets, including a rigorous evaluation based on a newly developed synthetic benchmark with ground-truth counterfactuals, scCBGM demonstrates superior accuracy in zero-shot generalization and cell-level counterfactual prediction compared to state-of-the-art methods, while providing interpretable control over biological concepts. We also show how counterfactual generation scCBM can be used in-silico to understand mechanism of action of cellular response to chemical perturbation by editing concepts capturing pathway activity scores that mediate response in real scRNA-seq data. scCBGM establishes a principled framework for high-fidelity in silico cellular experimentation and hypothesis testing in single-cell biology.
Submission Number: 44
Loading