Differential Gene Expression Analysis of the Most Relevant Genes for Lung Cancer Prediction and Sub-type Classification
Abstract: An early diagnosis of cancer is essential for a good prognosis, and the identification of differentially expressed genes can enable a better personalization of the treatment plan that can target those genes in therapy. This work proposes a pipeline that predicts the presence of lung cancer and the subtype allowing the identification of differentially expressed genes for lung cancer adenocarcinoma and squamous cell carcinoma subtypes. A gradient boosted tree model is used for the classification tasks based on RNA-seq data. The analysis of gene expressions that better differentiate cancerous from normal tissue, and features that distinguish between lung subtypes is the main focus of the present work. Differential expressed genes are analyzed by performing hierarchical clustering in order to identify gene signatures that are commonly regulated and biological signatures associated with a specific subtype. This analysis highlighted patterns of commonly regulated genes already known in the literature as cancer or subtype-specific genes, and others that are not yet documented in the literature.
Loading