SOFM-Top: Protein Remote Homology Detection and Fold Recognition Based on Sequence-Order Frequency Matrix
Abstract: Protein remote homology detection and fold recognition are critical for the studies of protein structure and function. Currently, the profile-based methods showed the state-of-the-art performance in this field, which are based on widely used sequence profiles, such as Position-Specific Frequency Matrix (PSFM) and Position-Specific Scoring Matrix (PSSM). However, these approaches ignore the sequence-order effects along protein sequence. In this study, we proposed a novel profile, called Sequence-Order Frequency Matrix (SOFM), which can incorporate the sequence-order information and extract the evolutionary information from Multiple Sequence Alignment (MSA). Statistical tests and experimental results demonstrated its effects. Combined with a previously proposed approach Top-n-grams, the SOFM was then applied to remote homology detection and fold recognition, and a computational predictor called SOFM-Top was proposed. Evaluated on four benchmark datasets, it outperformed other state-of-the-art methods in this filed, indicating that SOFM-Top would be a more useful tool, and SOFM is a richer representation than PSFM and PSSM. SOFM will have many potential applications since profiles have been widely used for constructing computational predictors in the studies of protein structure and function.
0 Replies
Loading