LEARNING DISCRETE REPRESENTATIONS TO UNDER- STAND AND PREDICT TISSUE BIOLOGY

ICLR 2026 Conference Submission18649 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Spatial Transcriptomics, Graph Machine Learning, AI for Science, Computational Biology
TL;DR: Tokenization Scheme for Tissues using Spatially-Resolved Transcriptomics
Abstract: Learning tissue-level representations that capture the organization of entire tissues while preserving cellular and microenvironmental detail is a central challenge in spatial biology. While graph autoencoders have been employed to learn spatially aware continuous representations, they have limited utility for tissue-level generation, lack inherent interpretability for biological analysis, and are not readily reusable across contexts and modeling architectures. To address this challenge, we present SQUINT, a discrete representation learning framework for spatially-resolved transcriptomics that encodes tissues into a finite vocabulary of interpretable discrete codes. SQUINT achieves this by combining graph neural networks with vector quantization, conditioning on relative spatial distances, and employing a masking strategy during training. Cells are then represented by assignments to this shared vocabulary, allowing whole tissues to be modeled as sequences of discrete tokens. At inference, SQUINT codes enable gene expression imputation at arbitrary spatial locations outperforming state-of-the-art generative methods across diverse datasets. Further, we demonstrate the interpretability of these discrete tokens in capturing meaningful tissue structures beyond individual cells and reflecting recurrent mi- croenvironmental organization patterns through downstream applications including 3D imputation, tumour stratification, and perturbation analysis.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 18649
Loading