SIGMMA: Hierarchical Graph-Based Multi-Scale Multi-modal Contrastive Alignment of Histopathology Image and Spatial Transcriptome
Keywords: spatial transcriptomics, histopathology, multi-modal contrastive learning, multi-scale representation learning, graph neural networks, H&E-ST alignment, cell-resolution imaging
TL;DR: We propose SIGMMA, a multi-scale contrastive framework that aligns H&E images and cell-resolution spatial transcriptomics over a hierarchical cell graph, reconciling graph receptive field with image ROI across modalities.
Abstract: Recent advances in computational pathology have leveraged vision–language models to learn joint representations of Hematoxylin and Eosin (H\&E) images with spatial transcriptomic (ST) profiles, but existing approaches typically align H\&E tiles and ST profiles at a single scale, overlooking fine-grained cellular structures and their spatial organization. We propose \textsc{Sigmma}, a multi-modal contrastive alignment framework for learning hierarchical H\&E-ST representations.
By enforcing multi-scale contrastive alignment, \textsc{Sigmma} ensures coherent representations across modalities, while a graph-based modeling of cell interactions integrates both inter- and intra-subgraph relationships to capture cellular organization from fine to coarse scales.
Across datasets, \textsc{Sigmma} consistently improves gene-expression prediction and cross-modal retrieval performance, and its learned multi-scale embeddings recover tumor microenvironments and immune-exclusion programs in pancreatic cancer.
Submission Number: 44
Loading