Deep Learning for Accurate Diagnosis of Viral Infections through scRNA-seq Analysis: A Comprehensive Benchmark Study

Published: 21 Feb 2025, Last Modified: 21 Feb 2025Accepted by DMLREveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Infectious disease diagnostics primarily rely on physicians' clinical expertise and rapid antigen/antibody tests, a subjective approach prone to errors due to various factors including patient history accuracy and physician experience. To address these challenges, we propose a biological evidence-based diagnostic tool using deep learning to analyze patient-derived single-cell RNA sequencing (scRNA-seq) profiles from blood samples. scRNA-seq provides high-resolution gene expression data at the single-cell level, capturing unique transcriptional signatures and immunological responses induced by different viral infections. In this work, we conducted the first-of-its-kind benchmark study to evaluate five computational models, including four deep learning-based methods (contrastiveVI, scVI, SAVER, scGPT) and PCA as a baseline - trained and evaluated on patient-derived scRNA-seq datasets carefully sourced by us. We assess their efficacy in distinguishing scRNA-seq profiles associated with various viral infections, aiming to identify distinct immunological features representative of each infection. The results demonstrate that contrastiveVI, outperforms other models in all key performance metrics and the visual cluster performance. Furthermore, our research also underscores the substantial influence of batch effects when analyzing scRNA-seq data from multiple sources. Overall, our study successfully demonstrates that deep learning models can accurately identify the type of infection from patient plasma samples based on scRNA-seq profiles, and improve the accuracy and specificity in the diagnosis of infectious diseases. This research contributes to the development of more objective, evidence-based diagnostic methods in the infectious disease domain, potentially reducing diagnostic errors and improving patient outcomes.
Keywords: Single Cell, RNA-seq, Deep Learning, Gene Expression
Changes Since Last Submission: This latest version incorporates all the revisions outlined in the rebuttal, including refined language and additional clarifications. Additionally, we have created and included an independent webpage (https://github.com/Ziweiyang9/Diagnosis-DMLR-dataset-details/tree/main) that provides comprehensive details on data preprocessing, dataset links, and benchmarking results.
Assigned Action Editor: ~Sergio_Escalera1
Submission Number: 80
Loading