Keywords: Computational Biology, Phenotypic screening, Foundation Models
TL;DR: PhenoSeq is a conditional diffusion model with a cross-attention denoising architecture that bridges two biological foundation models.
Abstract: Transcriptomic profiling resolves mechanism-of-action signal at single-cell resolution, but cannot match the scale or cost of morphological imaging. If the fingerprint of a treated cell population carries recoverable structure in transcriptomic space, every imaging experiment, spanning millions of cells at a fraction of the sequencing cost, becomes a latent source of molecular insight. We introduce PhenoSeq, a conditional diffusion model with a cross-attention denoising architecture that bridges two biological foundation models: a vision transformer for morphological features and a transcriptomic language model for single-cell gene expression. Operating under population-level supervision, it learns a conditional distribution over transcriptomic profiles from treatment-matched morphological observations. Evaluated on a 28-compound treatment-identification benchmark, PhenoSeq-generated embeddings outperform raw imaging in single-profile classification, and multimodal fusion recovers ≈29% of the gap to the real-transcriptomics ceiling; in the multi-profile setting, synthetic fusion more than doubles imaging-only balanced accuracy. Embedding-space fidelity confirms correct treatment localisation for the majority of conditions. These results demonstrate that generative cross-modal modelling from imaging to transcriptomics is both architecturally feasible and downstream-useful in phenotypic drug discovery. Our code is available at: https://anonymous.4open.science/r/PhenoSeq-E3E0/README.md
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 28
Loading