Abstract: Engineering promoters play a crucial role in cellular factory design by allowing specific engineering gene expression and regulating its expression levels. However, due to the variability in promoter sequence length and the conservation features among different bacterial strains, there is currently no universally applicable framework for predicting promoter expression levels across different strains. Addressing this issue, we propose the PromoterGCN model for predicting promoter strength across different host cells. PromoterGCN utilizes graph embedding vectors to handle variable-length promoter sequences, effectively addressing the issues of high-dimensional encoding methods and the inability to cope with genetic variations. To validate the model’s effectiveness, we introduce the WGAN-GP generative adversarial network for simulation experiments. The results show that PromoterGCN achieves Pearson coefficients of 0.366 and 0.547 on natural E. coli and yeast datasets, accuracy while adapting to different promoter lengths. Additionally, when predicting artificially generated promoters, the model learns the positional preferences of k-mers in natural promoters and captures certain conserved motifs within the -10 to -35 regions.
Loading