A Novel Approach for Prediction of Protein Subcellular Localization from Sequence Using Fourier Analysis and Support Vector Machines
Abstract: A novel method is presented for the prediction of protein subcellular localization from sequence using Fourier analysis and support vector machines. To extract the features of a protein sequence, each amino acid is replaced by a value representing its scale of hydrophobicity and then a fast Fourier transform is applied to the numerically encoded sequence. The transformed sequence data are then used as the input for the training of support vector machines to predict subcellular localization. The motivation for this method of encoding resides fundamentally on (1) the fact that periodicities are critically important factors in protein structure and (2) the ability of this method to capture information about long-range correlations and global symmetries which are completely missed by approaches based on global amino acid composition. Our method is evaluated against the integrated system PSORT-B for the prediction of subcellular localizations of proteins in Gram-negative bacteria. It is demonstrated that the new method outperforms PSORT-B in prediction for the inner membrane, the outer membrane, and extra cellular localizations in a 5-fold cross-validation. It is expected that integrated systems such as PSORT-B may benefit from inclusion of the advanced individual predictor presented in this paper.
0 Replies
Loading