Keywords: Generative Codon Optimization, Latent-space optimization, Computational protein engineering
Abstract: Codon optimization, the process of selecting synonymous codons to enhance mRNA translation efficiency and protein expression, is crucial for therapeutic protein production and mRNA-based vaccines. However, it faces two major challenges: navigating a vast, discrete combinatorial space that precludes gradient-based methods, and relying on heuristic proxies like Codon Adaptation Index or GC content balancing, which often fail to capture true expression dynamics. To address these, we introduce the Latent-Space Codon Optimizer (LSCO), which reformulates the problem in a continuous latent space derived from a pretrained mRNA language model, enabling efficient gradient-based optimization. Next, LSCO incorporates a data-driven expression objective trained on mRNA-protein expression data, regularized by a Minimum Free Energy for structural stability, and employs constrained decoding to ensure mRNA-protein fidelity. Evaluated on two mRNA-protein expression dataset, LSCO outperforms baselines such as frequency-based methods and recent naturalness-driven learned codon optimizers in predicted expression yields, while maintaining structural stability and host-appropriate GC content. Our results underscore LSCO's potential in advancing codon optimization, delivering mRNA sequences that excel in expression while ensuring thermodynamic stability and organism-specific compatibility.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 17500
Loading