everyone
since 13 Oct 2023">EveryoneRevisionsBibTeX
Proteins, central to biological systems, are complex due to interactions between sequences, structures, and functions shaped by physics and evolution, posing a challenge for accurate function prediction. Recent advancements in deep learning techniques demonstrate substantial potential for precise function prediction through learning representations from extensive protein sequences and structures. Nevertheless, practical function annotation heavily relies on modeling protein similarity using sequence or structure retrieval tools, given their accuracy and interpretability. To study the effect of inter-protein similarity modeling, in this paper, we comprehensively benchmark the retriever-based methods against predictors on protein function tasks, demonstrating the potency of retriever-based approaches. Inspired by these findings, we introduce an innovative variational pseudo-likelihood framework, ProtIR, designed to improve function prediction through iterative refinement between predictors and retrievers. ProtIR combines the strengths of both predictors and retrievers, showcasing an around 10% improvement over vanilla predictor-based methods. Furthermore, it achieves comparable performance to the state-of-the-art protein language model-based methods with significantly smaller training time, highlighting the efficacy of our approach.