Keywords: Self-Supervised Learning, Single-cell Clustering
Abstract: Single-cell clustering of scRNA-seq data is a typical and challenging problem that predicts cell subtype clusters given gene expression sequences from single-cell RNA data. Previous models utilized classical clustering (e.g., Principal Component Analysis, K-means) on well-annotated data to classify cells. However, they extremely relied on the expected number of clusters as input. To address the problem, in this work, we propose a novel multimodal self-supervised framework with masked expression modeling on single-cell data, namely mask-sc, that can learn compact and discriminative representations by reconstructing masked gene expression for scRNA-seq clustering. Our mask-sc aggregates high-frequency interconnections across multiple groups of expression sequences via a masked expression encoder applied on expression matrices. Then, a sequence-guided decoder is applied to recover sequence-level features of masked expression matrices. Finally, representations extracted from the gene expression encoder can be used for scRNA-seq clustering. We conduct extensive experiments on two scRNA-seq datasets, where empirical results demonstrate the effectiveness of our proposed mask-sc against previous baselines.
Submission Number: 24
Loading