Mutagenic: An Embedding-Based Approach to Protein Masking for Functional Redesign

Published: 06 Mar 2025, Last Modified: 18 Apr 2025ICLR 2025 Workshop LMRLEveryoneRevisionsBibTeXCC BY 4.0
Track: Tiny Paper Track
Keywords: protein language models, protein engineering, functional redesign
TL;DR: We propose a novel embedding-based masking approach for protein language model-based protein design generalizable to any target function
Abstract: Recent advances in language models have been applied to protein sequences because of their critical functions in biological processes and the availability of large datasets. Protein engineering has already proven to be impactful in areas such as therapeutics, agriculture, the environment, and bio-manufacturing. Motivated by the challenge of protein design, this paper investigates the following question: How can we efficiently identify residues to edit in the engineering of proteins with specific target functions? In this paper, we propose a novel embedding-based masking approach to edit a given protein to achieve a new target function. More formally, let $F = \{f_1, f_2, \dots, f_n\}$ denote the set of possible protein functions. Given a protein sequence $s = s_1 s_2 \dots s_N$ composed of amino acids $\{s_i\}_{i=1}^N$ with function $f \in F$ and a target function $f^\prime \in F$, our goal is to return a new protein sequence $s^\prime$ with functionality $f^\prime$.
Attendance: Robin Pan
Submission Number: 91
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview