SecretoGen: towards prediction of signal peptides for efficient protein secretion

Published: 27 Oct 2023, Last Modified: 21 Nov 2023GenBio@NeurIPS2023 PosterEveryoneRevisionsBibTeX
Keywords: transformer, protein, signal peptide, secretion, protein design, sequence generation
TL;DR: A generative transformer trained on millions of natural signal peptides shows good performance for ranking signal peptides by secretion efficiency.
Abstract: Signal peptides (SPs) are short sequences at the N terminus of proteins that control their secretion in all living organisms. Secretion is of great importance in biotechnology, as industrial production of proteins in host organisms often requires the proteins to be secreted. SPs have varying secretion efficiency that is dependent both on the host organism and the protein they are combined with. Therefore, to optimize production yields, an SP with good efficiency needs to be identified for each protein. While SPs can be predicted accurately by machine learning models, such models have so far shown limited utility for predicting secretion efficiency. We introduce **SecretoGen**, a generative transformer trained on millions of naturally occuring SPs from diverse organisms. Evaluation on a range of secretion efficiency datasets show that SecretoGen's perplexity has promising performance for selecting efficient SPs, without requiring training on experimental efficiency data.
Submission Number: 1
Loading