Redundancy-weighting for better inference of protein structural features

Chen Yanover, Natalia Vanetik, Michael Levitt, Rachel Kolodny, Chen Keasar

2014 (modified: 19 Jan 2022)Bioinform. 2014Readers: Everyone

Abstract: Structural knowledge, extracted from the Protein Data Bank (PDB), underlies numerous potential functions and prediction methods. The PDB, however, is highly biased: many proteins have more than one entry, while entire protein families are represented by a single structure, or even not at all. The standard solution to this problem is to limit the studies to non-redundant subsets of the PDB. While alleviating biases, this solution hides the many-to-many relations between sequences and structures. That is, non-redundant datasets conceal the diversity of sequences that share the same fold and the existence of multiple conformations for the same protein. A particularly disturbing aspect of non-redundant subsets is that they hardly benefit from the rapid pace of protein structure determination, as most newly solved structures fall within existing families.

0 Replies