HiBio-ST: A Hierarchical Multimodal Foundation Model with Biological Prior Anchors for Spatial Transcriptomics
Keywords: Spatial Transcriptomics, Foundation Model, Computational Pathology
TL;DR: HiBio-ST is a novel hierarchical multimodal foundation model with biological prior anchors for spatial transcriptomics modeling
Abstract: Spatial transcriptomics (ST) enables medical computer vision researchers to uncover the molecular relationships underlying tissue morphology. However, most existing vision–omics models are built on limited and homogeneous datasets, rendering them task-specific and with poor generalizability. Recent multimodal foundation models attempt to bridge histology and gene expression via contrastive objectives; however, they fail to effectively model spot-specific molecular context and overlook spatial dependencies by treating each spot–patch pair in isolation. To bridge these gaps, we present HiBio-ST, a novel hierarchical multimodal foundation model guided by biological prior anchors for ST analysis. HiBio-ST employs a progressive multi-level alignment pretraining pipeline to harmonize visual context with molecular identities. A TF–IDF reweighting strategy is first applied to highlight spatially informative “keyword” genes within ST profiles, reducing the dominance of ubiquitous housekeeping signals. Curated pathway anchors are then incorporated to inject global biological knowledge into the representation space. Moreover, hierarchical region-aware clustering united contiguous meso-scale regions into coherent structural patterns, allowing the model to capture higher-order spatial organization. We evaluated HiBio-ST on four downstream tasks across multiple datasets. Experimental results demonstrate that HiBio-ST consistently achieves state-of-the-art performance, underscoring its broad applicability in spatial transcriptomics modeling.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 1156
Loading