Majority Vote Cascading: A Semi-Supervised Framework for Improving Protein Function PredictionDownload PDFOpen Website

Published: 01 Jan 2022, Last Modified: 17 May 2023IEEE ACM Trans. Comput. Biol. Bioinform. 2022Readers: Everyone
Abstract: A method to improve protein function prediction for sparsely annotated PPI networks is introduced. The method extends the DSD majority vote algorithm introduced by Cao <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">et al.</i> to give confidence scores on predicted labels and to use predictions of high confidence to predict the labels of other nodes in subsequent rounds. We call this a <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">majority vote cascade</i> . Several cascade variants are tested in a stringent cross-validation experiment on PPI networks from <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">S. cerevisiae</i> and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">D. melanogaster</i> , and we show that for many different settings with several alternative confidence functions, cascading improves the accuracy of the predictions. A list of the most confident new label predictions in the two networks is also reported. Code and networks for the cross-validation experiments appear at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">http://bcb.cs.tufts.edu/cascade</uri> .
0 Replies

Loading