Towards Cross-Sample Alignment for Multi-Modal Representation Learning in Spatial Transcriptomics

Published: 04 Mar 2026, Last Modified: 07 Mar 2026ICLR 2026 Workshop LMRL PosterEveryoneRevisionsBibTeXCC BY 4.0
Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.
Track: long paper (4–8 pages excluding references)
Keywords: Spatial transcriptomics, representation learning, batch integration, multimodal data
TL;DR: We present a system that integrates transcriptomics, morphology, and spatial context across samples, enabling robust cross-sample alignment and multi-modal representations that reveal conserved cellular programs and spatial niches in diverse tissues.
Abstract: The growing number of spatial transcriptomics (ST) datasets enables comprehensive multi-modal characterization of cell types across diverse biological and clinical contexts. However, integration across patient cohorts remains challenging, as local microenvironment, patient-specific variability, and technical batch effects can dominate signals. Here, we hypothesize that combining specialized transcriptomics correction methods with deep representation learning can jointly align morphology, transcriptomics, and spatial information across multiple tissue samples. This approach benefits from recent transcriptomics and pathology foundation models, projecting cells into a shared embedding space where they cluster by cell type rather than dataset-specific conditions. Applying this framework to 18 skin melanoma, 12 human brain, and 4 lung cancer datasets, we demonstrate that it outperforms conventional batch-correction approaches by 58%, 38%, and 2-fold, respectively. Together, this framework enables efficient integration of multi-modal ST data across modalities and samples, facilitating the systematic discovery of conserved cellular programs and spatial niches while remaining robust to cohort-specific batch effects.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 3
Loading