Cancer molecular subtype classification by graph convolutional networks on multi-omics data

Bingjun Li, Tianyu Wang, Sheida Nabavi

Published: 01 Jan 2021, Last Modified: 01 Oct 2024BCB 2021EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Cancer has been a second leading cause of death in the United States for decades and an accurate classifier of cancers' molecular profiles is a key predictor for patients' survival. Recently The Cancer Genome Atlas research networks have identified a new cancer taxonomy based on molecular tumor subtypes over 33 types of cancer. Several studies have reported classification models for traditional tissue-of-origin cancer type classification or classification of subtypes of a cancer type. In this study, we propose a novel end-to-end deep learning model that incorporates prior biological knowledge into the model and integrates multi-omics data to classify pan-cancer molecular subtypes. Our proposed model consists of three sections: i) a graph convolutional network that takes a genet interaction network, representing prior knowledge, as its input graph where genes are nodes and multi-omics data are the node features, to extract localized features; ii) a fully connected neural network to extract global features from the data; and iii) a classification layer that takes the combination of localized features and global features as input. We examined building the input graph using gene-gene interaction networks, protein-protein interaction networks, and gene co-expression networks. We also investigated the effect of input graph size (number of genes/nodes) on the performance of the model. We evaluated the performance of the proposed model in terms of prediction accuracy, precision, recall, and F1 score; and compared the performance of our model with those of three state-of-the-art deep learning models and two conventional machine learning models. The results show that the proposed model outperforms the baseline models at each level of the number of genes. Our model achieves not only a better prediction accuracy but also a lower false-negative rate, which is important for cancer patients treatments. Our model also shows the benefit of employing multi-omics data compared with employing only single-omic data.