Keywords: scRNA-seq, GNN, data imputation
Abstract: Single-cell RNA sequencing (scRNA-seq) provides vast amounts of gene expression data. In this paper, we benchmark several graph neural network (GNN) approaches for cell-type classification using imputed single-cell gene expression data. We model the data in the Paul15 dataset, describing the development of myeloid progenitors, as a bipartite graph consisting of cell and gene nodes, with edge values signifying gene expression. We train a 3-layer GraphSage GNN to impute data by training it to reconstruct the dataset based on a downstream cell classification task. For this, we use a cell-cell graph representation on a small graph convolutional network (GCN) and an adjacency matrix predetermined by spectral clustering. When combined with the data imputation model, GNN classification performance is 58\%, marginally worse than an SVM benchmark of 59.4\%, however exhibits better learning and generalisation characteristics along with producing an auxiliary imputation model. Our findings catalyse the development of new tools to analyse complex single-cell datasets.