Predicting Gene Expression in Spatially Resolved Transcriptomics Across Samples Through Probabilistic Fusion of Hierarchical Histology and Spatial Information

Yinbo Liu; QiWu; Keyang Ye; Xiao He; Tian Tian

Predicting Gene Expression in Spatially Resolved Transcriptomics Across Samples Through Probabilistic Fusion of Hierarchical Histology and Spatial Information

Yinbo Liu, QiWu, Keyang Ye, Xiao He, Tian Tian

20 Sept 2025 (modified: 14 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Gene Expression Prediction, Cross-Slice Generalization, Variational Autoencoder, Spatially Resolved Transcriptomics

TL;DR: STevs: a deep generative model predicting gene expression from histology images through probabilistic fusion of hierarchical visual and spatial features, significantly improving cross-slice generalization and high-dimensional prediction

Abstract: Spatially resolved transcriptomics (SRT) is a transformative technology in biomedical research, yet its scalability is hindered by high costs and restricted capture areas. Computational methods for predicting high-quality gene expression are needed. However, existing methods are ineffective at predicting high-dimensional gene expression and generalizing to multiple spatial slices, primarily due to inter-sample heterogeneity and ineffective integration of visual and spatial information. To address these challenges, we propose STevs, a deep generative model designed to predict gene expression from tissue histology through a probabilistic fusion of image and spatial representations. STevs employs a multimodal variational autoencoder (VAE) architecture featuring parallel encoders that process distinct modalities: a Swin Transformer for hierarchical visual representation extraction and a multilayer perceptron (MLP) for spatial coordinates. The latent representations from these modalities are fused under uncertainty using a Product of Experts (PoE) mechanism. Furthermore, we introduce a latent alignment loss to explicitly promote a shared representation across modalities, thereby ensuring consistency between the image and spatial latent spaces. Comprehensive experimental evaluations demonstrate that STevs not only achieves state-of-the-art performance on standard within-slice gene prediction tasks but also significantly outperforms existing methods in the more challenging cross-slice prediction scenario. Our work provides a powerful computational tool capable of predicting gene expression directly from histology images, reducing the need for costly SRT experiments.

Supplementary Material: pdf

Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)

Submission Number: 24046

Loading