Track: Biology: datasets and/or experimental results
Nature Biotechnology: No
Keywords: protein design, synthetic biology, generative models
TL;DR: We generated a functionally validated overlapping gene sequence dataset and used it as a starting point to design a novel generative approach to design overlapping genes.
Abstract: Successfully designed overlapping genes are protected from mutations and horizontal gene transfers, and an effective computational design method can have a large impact on stabilizing genetic constructs in synthetic biology. However, designing overlapping protein sequences in an alternative reading frame is a challenging task because it requires substantial sequence changes to both proteins, and the sequence space is constrained by their shared DNA sequences. We present an experimental dataset of highly divergent overlapping gene pair sequences with functional validation designed using a previously published method CAMEOS. We propose an iterative method for designing overlapping genes making use of a cutting-edge frontier generative model ESM3 and compare its output to the experimentally validated sequences. Our results highlight the surprising effectiveness of ESM3 at predicting \textit{in vitro} protein fitness with only structure information. We used the new approach to generate over 2800 overlapping sequence designs with ESM3 computed scores higher than the minimum score of experimentally validated variants.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Presenter: ~Chenling_Xu1
Format: Yes, the presenting author will attend in person if this work is accepted to the workshop.
Funding: No, the presenting author of this submission does *not* fall under ICLR’s funding aims, or has sufficient alternate funding.
Submission Number: 35
Loading