Abstract: Generative models for single-cell RNA data have advanced rapidly, yet their relative strengths, limitations, and biological faithfulness remain poorly understood. In particular, recent generative diffusion and transformer models have not been systematically compared to each other, and standard generation metrics may fail to capture biologically relevant behavior. In this work, we present a benchmark of recent generative models for conditional scRNA-seq generation, including a diffusion model (scDiffusion), a transformer model (C2S-Scale), a variational autoencoder (SCVI which we adapt for conditional generation) and a decoder-only transformer model that we propose and that in contrast to C2S preserves exact count information. We evaluate the four models using a diverse set of distributional, discriminability, and gene-level metrics, and propose the recovery of differentially expressed genes as a biologically meaningful downstream task. While all models generate visually and distributionally realistic cells, the diffusion model being the weakest, we find that no method reliably preserves the log-fold changes required for standard DGE analysis, revealing a systematic attenuation of biological signal.
Submission Number: 57
Loading