GenST: A Generative Cross-Modal Model for Predicting Spatial Transcriptomics from Histology Images

Ruby Wood; Yang Hu; Jens Rittscher; Bin Li

GenST: A Generative Cross-Modal Model for Predicting Spatial Transcriptomics from Histology Images

Ruby Wood, Yang Hu, Jens Rittscher, Bin Li

Published: 22 Jul 2025, Last Modified: 11 Aug 2025COMPAYL 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: computational pathology, spatial transcriptomics, multimodal AI

TL;DR: We propose a generative cross-modal model for predicting spatial transcriptomics from histology images, training end-to-end models for each modality and aligning the latent spaces to conserve information from both modalities.

Abstract: Spatial transcriptomics is used to identify gene expression levels in certain locations across a tissue sample, preserving important spatial information in cancerous tissue samples for downstream clinical decision making. However, this technology is currently too expensive to be used in a routine clinical pathways. On the other hand, digital images of haematoxylin and eosin stained histology slides are routinely generated from tissue biopsy samples. Here, we develop a generative cross-modal method to predict spatial transcriptomics from histology images by aligning the latent space of two VQ-VAEs for each modality. We benchmark our approach on multiple sequencing technologies (Visium and ST) and cancer types (breast, brain, spinal cord and skin) from two public datasets, using 142 slides with 820,407 spots from STImage-1K4M (Chen et al., 2024a) and 568 slides with 254,812 spots from HEST-1k (Jaume et al., 2024). Across the resulting cohorts, our model achieves superior performance to state-of-the-art models in half, whilst providing an interpretable framework for understanding which genetic expressions of a cancer tumour can be captured from the morphology observed in corresponding locations of the histology image.

Submission Number: 5

Loading