Improving Classification and Data Imputation for Single-Cell Transcriptomics with Graph Neural Networks
Keywords: scRNA-seq, GNNs, GNN, data imputation, classification, cell classification
TL;DR: We apply standard GNN models to the cell classification and data imputation (using the GRAPE framework) on scRNA-seq data and achieve marginal improvements relative to an SVM benchmark in some cases.
Abstract: Single-cell RNA sequencing (scRNA-seq) provides vast amounts of gene expression data. In this paper, we benchmark several graph neural network (GNN) approaches for cell-type classification and imputation of missing values on single-cell gene expression. For cell classification, we use a cell-cell graph representation to find greatest performance using a graph convolutional network (GCN) model with a differentiable group normalisation (DGN) layer to alleviate issues of oversmoothing, in conjunction with an adjacency matrix predetermined by spectral clustering. This method marginally outperforms an SVM benchmark model, 59.4\% compared to 58.6\%, on the Paul15 dataset, which describes the development of myeloid progenitors. Performance scales well with the number of gene expressions, and on the PBMC3K dataset describing peripheral blood mononuclear cells with higher a higher number of gene expressions, this method outperforms an SVM benchmark, 95.6\% vs 94.2\%. For data imputation, we model the data as a bipartite graph consisting of cell and gene nodes, with edge values signifying gene expression. We train a 3-layer GraphSage GNN to impute data by training it to reconstruct the dataset based on the downstream task. When applied with this imputation model, GNN classification performance is similar at 58\%, however exhibits better learning and generalisation characteristics. Our findings catalyse the development of new tools to analyse complex single-cell datasets.