Keywords: single-cell RNA-seq, counterfactual generation, editing, concept bottleneck models, generative modeling, CBGM, flow matching, interpretable control, cellular perturbations, zero-shot generalization, interventions
TL;DR: We introduce scCBGM, a generative framework for single-cell RNA-seq that enables precise, interpretable counterfactual editing and robust generalization to unseen conditions.
Abstract: How would a cell behave under different conditions? Counterfactual editing of single cells is essential for understanding biology and designing targeted therapies, yet current scRNA-seq generative methods fall short: disentanglement models rarely support interventions, and most intervention-based approaches perform conditional generation that synthesizes new cells rather than editing existing ones. We introduce Single-Cell Concept Bottleneck Generative Models (scCBGMs), unifying counterfactual reasoning and generative modeling. scCBGM incorporates decoder skip connections and a cross-covariance penalty to decouple annotated concepts from unannotated sources of variation, enabling robust counterfactuals even under noisy concept annotations. Using an abduction–action–prediction procedure, we edit cells at the concept level with per-cell precision and generalize zero-shot to unseen concept combinations. Conditioning modern generators (e.g., flow matching) on scCBGM embeddings preserves state-of-the-art fidelity while providing precise controllability. Across three datasets (up to 21 cell types), scCBGM improves counterfactual accuracy by up to 4×. It also supports mechanism-of-action analyses by jointly editing perturbation and pathway-activity concepts in real scRNA-seq data. Together, scCBGM establishes a principled framework for high-fidelity in silico cellular experimentation and hypothesis testing in single-cell biology.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 22838
Loading