ProSol DB: A Protein Solubility Database

Sara A. Amin, Venkatesh Endalur Gopinarayanan, Nikhil U. Nair, Soha Hassoun

Published: 2017, Last Modified: 12 May 2023BCB 2017Readers: Everyone

Abstract: Engineering non-native synthesis pathways in microbial hosts has shown promise in producing commercially useful molecules. The selection of highly soluble protein sequences to implement catalyzing reactions along synthesis pathways can be facilitated by predicting the solubility of protein sequences in the host. Current solubility predictors apply machine-learning algorithms, such as Support Vector Machines (SVM) and Neural Networks (NN), to predict solubility using protein sequence features such as hydrophilicity, net charge and α-helix. Features are then used to build classifiers [1, 2], one-layered logistic regression models [3], or more sophisticated multi-layered models [4, 5] that determine protein solubility. We present in this poster a database, referred to as Protein Solubility Database (ProSol DB), that allows for quick lookup of predicted solubility values using Enzyme Commission (EC) numbers. We used ccSOL omics [6] to compute solubility prediction scores for various proteins from UniProKBt in E. coli. ProSol DB serves as a source of identifying protein sequences with high predicted solubility scores eliminating the need of recurring calls to protein solubility predictors. Combining the ProSol DB with synthesis pathway tools can assist in avoiding experimental efforts spent on expressing low solubility enzymes when more soluble alternatives are identified. This work promises to expedite the design-build-test cycle of metabolic engineering efforts.

0 Replies