A New Method for Predicting Protein Secondary Structures Based on Stochastic Tree Grammars

Naoki Abe, Hiroshi Mamitsuka

1994 (modified: 16 Jul 2019)ICML 1994Readers: Everyone

Abstract: We propose a new method for predicting protein secondary structure of a given amino acid sequence, based on a training algorithm for the probability parameters of a certain type of stochastic tree grammars. In particular, we concentrate on the problem of predicting β-sheet regions, which has previously been considered difficult because of the unbounded dependencies exhibited by sequences corresponding to β-sheets. To cope with this difficulty, we use a new family of stochastic tree grammars, which we call Stochastic Ranked Node Rewriting Grammars (SRNRG), which are powerful enough to capture the type of dependencies exhibited by the sequences of β-sheet regions, such as the ‘parallel’ and ‘anti-parallel’ dependencies and their combinations. Our learning algorithm is an adaptation of the ‘Inside-Outside’ algorithm (for Stochastic CFG) to SRNRG with a couple of significant modifications: By placing a restriction on the form of SRNRG, we devised a simpler and faster learning algorithm, and the algorithm is equipped with a new iterative way of reducing the alphabet size (i.e. the number of amino acids) by clustering them using their physico-chemical properties. Our preliminary experiments indicate that our method is able to capture and generalize the kind of long-distance dependencies exhibited by β-sheets, which was previously not possible. Our method was actually able to predict the β-sheet regions of a protein that is less than 25 per cent homologous to the sequences in the training data.

0 Replies