A Comprehensive Benchmark of Batch Integration Methods for Spatial Transcriptomics Using a Large-Scale Cancer Atlas

Published: 04 Mar 2026, Last Modified: 11 Mar 2026ICLR 2026 Workshop LMRL PosterEveryoneRevisionsBibTeXCC BY 4.0
Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.
Track: long paper (4–8 pages excluding references)
Keywords: Spatial Transcriptomics, integration, cancer biology, batch effect, out-of-domain generalization.
Abstract: Spatial transcriptomics (ST) enables spatially-resolved gene expression measurement, providing insights into tissue architecture and disease biology. However, batch effects from sequencing protocols, sample processing, and other technical factors can confound biological signals. Although batch correction has been extensively studied in single-cell transcriptomics, spatial integration methods lack rigorous benchmarking on large real-world datasets. This study benchmarks 11 representation-learning methods across three categories—linear, graph-based and probabilistic methods using Owkin's MOSAIC Window dataset, a large-scale spatial transcriptomics atlas of human cancers. Methods are evaluated across three criteria: batch correction, biological conservation, and spatial conservation. We also propose a new integration metric to assess robustness of representations to domain shifts and generalizability to unseen samples. Probabilistic methods (scVIVA, scVI) outperform linear and graph-based approaches in batch correction and biological conservation. On the other hand, graph-based methods excel at spatial conservation but underperform in batch integration. Out-of-Distribution (OOD) evaluation reveals that sophisticated methods show reduced performance on unseen samples while linear methods maintain robust generalization, highlighting trade-offs between integration quality and generalizability that should guide method selection for real-world applications.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 53
Loading