A gene ranking method using text-mining for the identification of disease related genesDownload PDFOpen Website

Published: 01 Jan 2010, Last Modified: 05 Nov 2023BIBM 2010Readers: Everyone
Abstract: For the identification of significant genes involved in specific diseases, microarray gene expression profiles have been widely used to prioritize candidate genes. In this paper, we propose a new gene ranking method that employs genegene relations extracted from literature along with gene expression scores obtained from microarrays. Here the genegene relations are extracted by taking a hybrid approach which is a combination of syntactic analysis and co-occurrence based approaches. Specifically, we perform the syntactic parsing on the text and then, within each clause of the parsed sentence, the co-occurred gene names are considered to be mutually related. Both the gene network derived from the gene-gene relations obtained in the above way and the gene expression scores are given as the inputs to the GeneRank algorithm. For the evaluation of our approach, we conducted experiments with the publicly available prostate cancer data. The results show that our method is superior in the precision and the recall to the original GeneRank which employs the gene-gene relations built from gene ontology annotations. Furthermore, our hybrid approach to the gene-gene relation extraction produces better prioritization of truly disease-related genes in top ranks than the existing popular co-occurrence approach.
0 Replies

Loading