Building Foundation Models to Characterize Cellular Interactions via Geometric Self-Supervised Learning on Spatial Genomics
Track: Main track (up to 8 pages)
Abstract: Cellular interactions form the fundamental/core circuits that drive development, physiology, and disease within tissues. Advances in spatial genomics (SG) and artificial intelligence (AI) offer unprecedented opportunities to computationally analyze and predict the behavior of cell intricate networks, and to identify interactions that drive disease states.
However, challenges arise in both \textit{methodology} and \textit{scalability}: \textbf{(i)} how to computationally characterize
complicated cellular interactions of multi-scale nature, where chemical genes/circuits in individual cells process information and drive interactions among large numbers of diverse cell types,
and \textbf{(ii)} how to scale up the pipeline to accommodate the increasing volumes of SG data that map transcriptome-scale gene expression and spatial proximity across millions of cells.
In this paper, we introduce the \textbf{Cellular Interaction Foundation Model} (\textbf{CIFM}), an AI foundation model functioning to analyze and simulate cellular interactions within living tissues.
In the CIFM pipeline, we explicitly capture and embed interactions of cells within microenvironments by leveraging the powerful and scalable geometric graph neural network model, and optimize the characterization of cellular interactions with a novel self-supervised learning objective -- we train it to infer gene expressions of cells based upon their surrounding microenvironments.
As a result, we construct CIFM with 100 million parameters by consuming SG data of 23 million cells.
Our benchmarking experiments show CIFM effectively infers gene expressions conditional on the microenvironmental contexts:
we achieve a high correlation and a low mismatch error, with 71.4\% of cells being annotated as the similar cell type based on their predicted and actual expressions on Visium-HD.
We demonstrate the downstream utility of CIFM by: (i) applying CIFM to embed tumor samples to capture cellular interactions within tumor microenvironments (ROC-AUC score of 0.862 on classifying sample conditions via linear probing on embeddings), and identifying shared signatures across samples; and (ii) using CIFM to simulate changes in microenvironmental composition in response to T cell infiltration, which highlights how CIFM can be leveraged to model cellular responses to tissue perturbations -- an essential step toward constructing ``AI virtual tissues".
Our model is open source and publicly accessible at \url{https://huggingface.co/ynyou/CIFM}.
Submission Number: 64
Loading