Keywords: Multiple Instance Learning, Transcriptomics, scRNA-seq, snRNA-seq
TL;DR: A robust cell type-aware pooling strategy improves phenotype prediction from scRNA-seq, especially when cell types are sparse.
Abstract: Single-cell RNA sequencing (scRNA-seq) enables high-resolution profiling of cellular heterogeneity, offering a promising foundation for predicting phenotypes such as disease status. We propose a pooling strategy that utilizes cell type annotations by first aggregating cell representations within each cell type, followed by integration of cell type representations into a sample-level representation. Evaluated across three scRNA-seq datasets of varying sizes and biological contexts, our model consistently outperforms baseline models in phenotype classification. Our model is particularly effective in datasets with missing or sparsely represented cell types. These results underscore the importance of carefully incorporating cell type information for robust phenotype prediction from scRNA-seq data.
Submission Number: 12
Loading