Interpretable Self-Supervised Prototype Learning for Single-Cell Transcriptomics

Published: 06 Mar 2025, Last Modified: 18 Apr 2025ICLR 2025 Workshop LMRLEveryoneRevisionsBibTeXCC BY 4.0
Track: Full Paper Track
Keywords: self-supervised learning, representation learning, prototype learning, single-cell transcriptomics, interpretable machine learning, denoising, biological data analysis, kNN graph, batch effect removal, unsupervised learning, metacells, cell state representation, marker gene identification, geometric structure preservation, contrastive learning, deep learning, deep learning for biology, graph-based learning, structured representation learning
TL;DR: scProto is an interpretable self-supervised method that learns prototypes decoded into metacells—denoised representations that enhance biological interpretation, preserve cell structure, and remove batch effects in single-cell transcriptomics.
Abstract: Single-cell transcriptomics is inherently noisy and sparse, posing significant challenges for uncovering underlying biological mechanisms. Addressing this issue requires effective denoising strategies to enhance the reliability of biological interpretation. Self-supervised learning has emerged as a powerful approach for learning robust representations across large single-cell datasets, improving denoising and facilitating more accurate biological insights. In this work, we present scProto, an interpretable self-supervised learning framework that learns prototypes, which are subsequently decoded into metacells—denoised representations that aggregate information from multiple similar cells across datasets. These metacells enhance robustness, mitigate noise, and provide a more stable and biologically meaningful representation of cell states. Beyond denoising, scProto is designed to preserve the structural relationships in the k-nearest neighbor (KNN) graph of the input space while simultaneously removing batch effects through self-supervised prototype learning. The loss function ensures that all cell populations, including rare ones, are well-represented through prototypes. We demonstrate that scProto metacells effectively capture marker genes, leading to improved cell-type distinction. Model performance is evaluated using scGraph metrics, which assess the preservation of cell similarity structures and geometric relationships in the embedding space, where scProto generally outperforms other methods. Additionally, batch effect removal and biological conservation are assessed using scIB metrics, indicating that scProto performs on par with the best-performing models while achieving better preservation of structural relationships in the embedding space.
Attendance: Fatemeh S. Hashemi G.
Submission Number: 98
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview