Surrogate-Assisted Multiobjective Gene Selection for Cell Classification From Large-Scale Single-Cell RNA Sequencing Data
Abstract: Accurate cell classification is crucial but expensive for large-scale single-cell RNA sequencing (scRNA-seq) analysis. Gene selection (GS) emerges as a pivotal technique in identifying gene subsets of scRNA-seq for classification accuracy improvement and gene scale reduction. Nevertheless, the rising scale of scRNA-seq data presents challenges to existing GS methods regarding performance and computational time. Thus, we propose a surrogate-assisted evolutionary algorithm for multiobjective GS to address these deficiencies. An innovative two-phase initialization method is proposed to select sparse solutions to provide preliminary insights into gene contributions. Then, a binary competitive swarm optimizer is proposed for effective global search, where a local search method is embedded to eliminate irrelevant genes for efficiency consideration. Additionally, a surrogate model is adopted to forecast classification accuracy efficiently and substitutes part of the computationally expensive classification process. Experiments are conducted on eight large-scale scRNA-seq datasets with more than 20 000 genes. The effectiveness of the proposed GS method for scRNA-seq cell classification compared with eight state-of-the-art methods is validated. Gene expression analysis results of selected genes further validated the significance of the genes selected by the proposed method in the classification of scRNA-seq data.
External IDs:dblp:journals/tec/LinHJHJ25
Loading