LEARNING DISCRETE REPRESENTATIONS TO UNDER- STAND AND PREDICT TISSUE BIOLOGY

Arpit Merchant; Sebastian Birk; Amirhossein Vahidi; Daniyal Jafree; Lloyd Steele; Batuhan Cakir; April Rose Foster; Vijaya Baskar MS; Muzlifah Haniffa; Mohammad Lotfollahi

LEARNING DISCRETE REPRESENTATIONS TO UNDER- STAND AND PREDICT TISSUE BIOLOGY

Arpit Merchant, Sebastian Birk, Amirhossein Vahidi, Daniyal Jafree, Lloyd Steele, Batuhan Cakir, April Rose Foster, Vijaya Baskar MS, Muzlifah Haniffa, Mohammad Lotfollahi

19 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Spatial Transcriptomics, Graph Machine Learning, AI for Science, Computational Biology

TL;DR: Tokenization Scheme for Tissues using Spatially-Resolved Transcriptomics

Abstract: Learning tissue-level representations that capture the organization of entire tissues while preserving cellular and microenvironmental detail is a central challenge in spatial biology. While graph autoencoders have been employed to learn spatially aware continuous representations, they have limited utility for tissue-level generation, lack inherent interpretability for biological analysis, and are not readily reusable across contexts and modeling architectures. To address this challenge, we present SQUINT, a discrete representation learning framework for spatially-resolved transcriptomics that encodes tissues into a finite vocabulary of interpretable discrete codes. SQUINT achieves this by combining graph neural networks with vector quantization, conditioning on relative spatial distances, and employing a masking strategy during training. Cells are then represented by assignments to this shared vocabulary, allowing whole tissues to be modeled as sequences of discrete tokens. At inference, SQUINT codes enable gene expression imputation at arbitrary spatial locations outperforming state-of-the-art generative methods across diverse datasets. Further, we demonstrate the interpretability of these discrete tokens in capturing meaningful tissue structures beyond individual cells and reflecting recurrent mi- croenvironmental organization patterns through downstream applications including 3D imputation, tumour stratification, and perturbation analysis.

Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)

Submission Number: 18649

Loading