Keywords: Cell type annotation, Contextualized Association Scoring, Differentiated supervisory strategies, Context-aware self-training framework
Abstract: Single-cell annotation is a fundamental task in the analysis of single-cell data, and one promising research direction relies on the marker gene information accumulated in biology. Recently, self-training strategies have been introduced into the field, which significantly improve the annotation accuracy by iteratively optimizing the model. However, existing methods have not yet systematically explored how
to construct self-training frameworks that are more applicable to single-cell data. To this end, we propose the context-aware self-training model CSSTA. First, the contextual information of marker genes is introduced to enhance the compatibility of marker genes with different single-cell datasets to generate high-quality pseudo-labels. Second, high- and low-confidence pseudo-labels recognition and supervision strategies more applicable to single-cell data are designed that can better guide the optimization of the model. Finally, the insight of the single-cell foundation model on cell-cell association information is introduced by GNN. Experiments demonstrate that the introduction of marker gene contextual information significantly improves the ability to recognize cell-cell type associations with heuristic-based strategies. Benchmark experiments show that CSSTA significantly outperforms state-of-the-art methods. Notably, we demonstrate the potential of CSSTA for hierarchical cellular annotation by extending it to hierarchies.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 16036
Loading