Keywords: spatial proteomics clustering, cross-view contrastive learning, imbalanced learning, unsupervised learning
Abstract: Single-cell spatial proteomics can reveal protein expression patterns while preserving the spatial structure of tissues, providing valuable insights into cellular functions and disease mechanisms. Spatial proteomics data clustering is a fundamental step in such studies, but it remains in the preliminary exploration phase, facing at least two prominent challenges: i) Functional regions within tissues often exhibit inherent area variations and imbalanced cell quantities, leading the model to favor features of majority classes, thus overshadowing the characteristics of minority ones. ii) Cellular identity is influenced by both intrinsic protein expression and the external spatial microenvironment; however, the heterogeneity and potential conflicts between these two information sources make it difficult to effectively identify subtle yet biologically significant cellular states. To overcome these issues, we propose a deep clustering framework named spClust. Our approach first introduces a spatially constrained synthetic minority oversampling technique to generate biologically meaningful cells of minority classes, alleviating the feature bias caused by cell type imbalance. Furthermore, we construct a spatiality adjacency graph and an expression similarity graph between cells, forming a decoupled dual-view contrastive learning architecture. We then define an adaptive mechanism to fuse the dual-view features and to assign soft cluster labels using dynamic prototypes, and further optimize labels by maximizing the modularity loss. Extensive experiments on spatial proteomics datasets demonstrate that spClust effectively identifies minority cells and improves the distinction of different cells, confirming its effectiveness and superiority.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 24873
Loading