scSAMAC: saliency-adjusted masking induced attention contrastive learning for single-cell clustering

Published: 01 Jan 2025, Last Modified: 13 May 2025Briefings Bioinform. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Single-cell sequencing technology has enabled researchers to study cellular heterogeneity at the cell level. To facilitate the downstream analysis, clustering single-cell data into subgroups is essential. However, the high dimensionality, sparsity, and dropout events of the data make the clustering challenging. Currently, many deep learning methods have been proposed. Nevertheless, they either fail to fully utilize pairwise distances information between similar cells, or do not adequately capture their feature correlations. They cannot also effectively handle high-dimensional sparse data. Therefore, they are not suitable for high-fidelity clustering, leading to difficulties in analyzing the clear cell types required for downstream analysis. The proposed scSAMAC method integrates contrastive learning and negative binomial losses into a variational autoencoder, extracting features via contrastive unit similarity while preserving the intrinsic characteristics. This enhances the robustness and generalization during the clustering. In the contrastive learning, it constructs a mask module by adopting a negative sample generation method with gene feature saliency adjustment, which selects features more influential in the clustering phase and simulates data missing events. Additionally, it develops a novel loss, which consists of a soft k-means loss, a Wasserstein distance, and a contrastive loss. This fully utilizes data information and improves clustering performance. Furthermore, a multi-head attention mechanism module is applied to the latent variables at each layer of autoencoder to enhance feature correlation, integration, and information repair. Experimental results demonstrate that scSAMAC outperforms several state-of-the-art clustering methods.
Loading