SAINT: Sequence-Aware Integration for Spatial Transcriptomics Multi-View Clustering

Zeyu Zhu; KE LIANG; Lingyuan Meng; Meng Liu; Suyuan Liu; Renxiang Guan; Miaomiao Li; Wanwei Liu; Xinwang Liu

SAINT: Sequence-Aware Integration for Spatial Transcriptomics Multi-View Clustering

Zeyu Zhu, KE LIANG, Lingyuan Meng, Meng Liu, Suyuan Liu, Renxiang Guan, Miaomiao Li, Wanwei Liu, Xinwang Liu

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-view Clustering, Graph Learning, AI4SCI

Abstract: Spatial transcriptomics (ST) technologies provide gene expression measurements with spatial resolution, enabling the dissection of tissue structure and function. A fundamental challenge in ST analysis is clustering spatial spots into coherent functional regions. While existing models effectively integrate expression and spatial signals, they largely overlook sequence-level biological priors encoded in the DNA sequences of expressed genes. To bridge this gap, we propose SAINT (Sequence-Aware Integration for Nucleotide-informed Transcriptomics), a unified framework that augments spatial representation learning with nucleotide-derived features. We construct sequence-augmented datasets across 14 tissue sections from three widely used ST benchmarks (DLPFC, HBC, and MBA), retrieving reference DNA sequences for each expressed gene and encoding them using a pretrained Nucleotide Transformer. For each spot, gene-level embeddings are aggregated via expression-weighted and attention-based pooling, then fused with spatial-expression representations through a late fusion module. Extensive experiments demonstrate that SAINT consistently improves clustering performance across multiple datasets. Experiments validate the superiority, effectiveness, sensitivity, and transferability of our framework, confirming the complementary value of incorporating sequence-level priors into spatial transcriptomics clustering.

Primary Area: General machine learning (supervised, unsupervised, online, active, etc.)

Submission Number: 26641

Loading