AURORA: Alignment-Guided Mutation Proposal for Protein Engineering

Published: 28 May 2026, Last Modified: 28 May 2026GenBio 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: protein engineering, protein language models, multiple sequence alignment, representation learning, reinforcement learning
Abstract: Protein engineering introduces mutations to enhance protein function and has immense therapeutic, agricultural, and industrial applications, but experimental validation is expensive, limiting available data. The prevailing computational approach uses a protein foundation model for two tasks: an oracle built on its representations scores mutation effects, and a search procedure proposes mutations through reinforcement learning (RL). These approaches predominantly rely on single-sequence models, namely ESM, that predict masked amino acids, requiring them to infer mutation interactions from single sequences rather than mutation co-evolution patterns across multiple homologous sequences. We introduce an alignment-guided mutation proposer and oracle (AURORA), a protein engineering framework with two key components. First, we investigate the natural transition to multiple sequence alignment (MSA)-based models, namely MSA Pairformer, which directly compares homologs; we quantify architectural expressivity on synthetic proteins and find Pairformer performs better on downstream benchmarks, notably ProteinGym. Second, because representation models capture evolutionary distributions while search optimizes experimental rewards, we decouple these tasks: Pairformer scores mutants while a separate lightweight policy trained with RL proposes mutations, enabling direct multi-site proposals rather than iterative single-site search. We then validate AURORA in vitro on green fluorescent protein, training on limited data to generate novel variants that demonstrate higher fluorescence than existing methods.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 170
Loading