Curiosity Driven Protein Sequence Generation via Reinforcement Learning

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Protein Sequence Design, RL
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Protein sequence design is a critical problem in the field of bio-engineering and biotechnology. However, the search space for protein sequence design is incredibly vast and sparsely populated, which poses significant challenges. On the other hand, generative models struggle to adapt to different usage scenarios and objectives, leading to limited adaptability and generalization. To address these challenges, we explore a reinforcement learning algorithm based on latent space that enables protein sequence generation and mutation for different scenarios. Our approach has several advantages: (1) The reinforcement learning algorithm allows us to adjust the reward function according to different tasks and scenarios, enabling the model to generate and mutate protein sequences in a targeted manner. (2) The latent space mapped by ESM-2 is continuous, unlike the initial sparse and discrete space, and the curiosity mechanism further improves search efficiency. We evaluate our method in completely different scenarios, including different protein functions and sequences, and our experimental results demonstrate significant performance improvement over existing methods. We conduct multiple ablation studies to validate the rationality of our design.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7680
Loading