Multimillion cell self-supervised representation learning enables organ-scale tissue niche discovery

Multimillion cell self-supervised representation learning enables organ-scale tissue niche discovery

ICML 2025 Workshop FM4LS Submission23 Authors

Published: 12 Jul 2025, Last Modified: 12 Jul 2025FM4LS 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Machine Learning, Neuroscience, Genomics, Self-supervised learning, Graph Transformer

TL;DR: CellTransformer is a self-supervised encoder–decoder model that scales to multimillion-cell spatial transcriptomics data, detecting fine-grained, spatially coherent tissue domains—recapitulating known CCF regions and revealing novel niches.

Abstract: Spatial transcriptomics (ST) offers unique opportunities to define the spatial organization of tissues and organs, such as the mouse brain. We establish a workflow for self-supervised spatial domain detection that is scalable to multimillion-cell datasets and analysis of organ-scale ST datasets. This workflow uses a self-supervised framework for learning latent representations of tissue niches. We use a novel encoder-decoder architecture, which we named CellTransformer, to hierarchically learn higher-order tissue features from lower-level cellular and molecular statistical patterns. CellTransformer is effective at integrating cells across tissue sections, identifying domains highly similar to ones in existing ontologies such as Allen Mouse Brain Common Coordinate Framework (CCF) while allowing discovery of hundreds of uncataloged areas with minimal loss of domain spatial coherence. CellTransformer advances the state of the art for spatial transcriptomics by providing a performant solution for the detection of fine-grained tissue domains from spatial transcriptomics data.

Submission Number: 23

Loading