Keywords: Spatial Omics, GNNs, Self-supervised Learning, Data Augmentation
TL;DR: Systematic study of graph augmentations for self-supervised GNN pretraining in spatial omics.
Abstract: Spatial omics technologies provide rich insights into biological processes by jointly capturing molecular profiles and the spatial organization of cells. The resulting high-dimensional data can be naturally represented as graphs, where Graph Neural Networks (GNNs) offer an effective framework to model interactions in the tissue. Self-supervised pretraining methods such as Bootstrapped Graph Latents (BGRL) and GRACE leverage graph augmentations to build invariances without costly labels. Yet, the design of augmentation strategies remains underexplored, particularly in the context of spatial omics. In this work, we systematically investigate how different graph augmentations affect embedding quality and downstream performance in spatial omics. We evaluate a suite of existing and novel augmentations, including transformations tailored to biological variation, across two representative tasks: unsupervised domain identification in healthy tissue and supervised phenotype prediction in cancer tissue. Our results show that carefully chosen augmentations substantially improve performance, whereas poorly aligned or overly complex augmentations may fail to help or even degrade performance. These findings highlight the central role of augmentation design in enforcing meaningful invariances for graph contrastive pretraining in spatial omics.
Submission Number: 88
Loading