Cell Painting Generates Single-Cell Transcriptomics via Conditional Diffusion

Reed Naidoo; Jingyu Hu; Giuseppe Tripodi; Chris Bakal; Tapabrata Chakraborti

Cell Painting Generates Single-Cell Transcriptomics via Conditional Diffusion

Reed Naidoo, Jingyu Hu, Giuseppe Tripodi, Chris Bakal, Tapabrata Chakraborti

Published: 28 May 2026, Last Modified: 11 Jun 2026ICML 2026 FM4LS Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Computational Biology, Phenotypic screening, Foundation Models

TL;DR: PhenoSeq is a conditional diffusion model with a cross-attention denoising architecture that bridges two biological foundation models.

Abstract: Transcriptomic profiling resolves mechanism-of-action signal at single-cell resolution, but cannot match the scale or cost of morphological imaging. If the fingerprint of a treated cell population carries recoverable structure in transcriptomic space, every imaging experiment, spanning millions of cells at a fraction of the sequencing cost, becomes a latent source of molecular insight. We introduce PhenoSeq, a conditional diffusion model with a cross-attention denoising architecture that bridges two biological foundation models: a vision transformer for morphological features and a transcriptomic language model for single-cell gene expression. Operating under population-level supervision, it learns a conditional distribution over transcriptomic profiles from treatment-matched morphological observations. Evaluated on a 28-compound treatment-identification benchmark, PhenoSeq-generated embeddings outperform raw imaging in single-profile classification, and multimodal fusion recovers ≈29% of the gap to the real-transcriptomics ceiling; in the multi-profile setting, synthetic fusion more than doubles imaging-only balanced accuracy. Embedding-space fidelity confirms correct treatment localisation for the majority of conditions. These results demonstrate that generative cross-modal modelling from imaging to transcriptomics is both architecturally feasible and downstream-useful in phenotypic drug discovery. Our code is available at: https://anonymous.4open.science/r/PhenoSeq-E3E0/README.md

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 28

Loading