scSAG$^{2}$E: Sparse Autoencoders With Gene Graph Embedding for scRNA-Seq Data Clustering

Published: 2025, Last Modified: 23 Jan 2026IEEE Trans. Comput. Biol. Bioinform. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Recently, advances in single-cell sequencing (scRNA-seq) technology have enabled large-scale transcriptome analysis with high efficiency and single-cell resolution. Clustering in scRNA-seq is crucial for revealing and categorizing new cell types and gene expression patterns. However, accurate cell clustering remains a challenge due to the high dimensionality and complexity of scRNA-seq data. To overcome this challenge, in this paper, we propose a novel deep scRNA-seq clustering framework, called sparse autoencoders with gene graph embedding (scSAG$^{2}$E). In our scSAG$^{2}$E method, two autoencoders are firstly used to learn the low-dimensional representation of cells and genes, respectively, and the gene expression matrix of cells is reconstructed by matrix multiplication. To preserve the manifold structure of cells, we incorporate graph regularization into the cell autoencoder. Additionally, we impose sparse constraints to address the sparsity of the gene expression matrix effectively. Meanwhile, we construct the gene graph using the $K$NN algorithm and then feed it into the graph convolution networks (GCNs). Therefore, it effectively captures the underlying structure among genes and enhances the signal of differentially expressed genes, leading to a more accurate representation of scRNA-seq data. Extensive experimental results on several publicly available scRNA-seq datasets show that the proposed scSAG$^{2}$E method outperforms several state-of-the-art single-cell analysis methods in clustering tasks.
Loading