A Balanced Hierarchical Multi-label Classification Method for Protein Function Prediction
Abstract: Protein function prediction is essential for enhancing our understanding of biological processes and advancing biology and medicine. Recent computational models have shown impressive accuracy in predicting protein sequences across hundreds or thousands of functional classes. However, these functions are often organizedinto a large, unbalanced hierarchical structure, like that defined byGene Ontology, which can lead to prior errors and the neglect ofrare functional classes. Many existing models also rely on intermediateprotein structures for predictions, making them time-consumingand prone to inaccuracies for unknown protein. In this study, weintroduce the MTP (Metric-learning then Pruning) model, whichuses metric-learning and focuses on bottom-level annotations. Wethen implement a pruning step to exclude misclassified labels duringmetric-learning to enhance prediction accuracy and reliability. Thisarticle not only present the novel method, but also propose a thoughtof function prediction suitable on the structure like GO terms. Ourvalidation shows that MTP significantly outperforms many contemporarymodels, particularly in predicting rare functional classes.
Loading