PRISM: Enhancing Protein Inverse Folding through Fine-Grained Retrieval on Structure-Sequence Multimodal Representations

Published: 06 Mar 2025, Last Modified: 23 Mar 2025ICLR 2025 Workshop MLMP PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: Short paper
Keywords: Retrieval Augmented Generation, Protein Inverse Folding, Protein Sequence Design, Multimodal Representation
TL;DR: We present PRISM, a multimodal retrieval-augmented generation framework that enhances protein inverse folding by dynamically integrating fine-grained structure-sequence multimodal representations from a larger protein database.
Abstract: 3D structure-conditioned protein sequence generation, also known as protein inverse folding, is a key challenge in computational biology. While large language models for proteins have made significant strides, they cannot dynamically integrate rich multimodal representations from existing datasets, specifically the combined information of 3D structure and 1D sequence. Additionally, as datasets grow, these models require retraining, leading to inefficiencies. In this paper, we introduce PRISM, a novel retrieval-augmented generation (RAG) framework that enhances protein sequence design by dynamically incorporating fine-grained multimodal representations from a larger set of known structure-sequence pairs. Our experiments demonstrate that PRISM significantly outperforms state-of-the-art techniques in sequence recovery, emphasizing the advantages of incorporating fine-grained, multimodal retrieval-augmented generation in protein design.
Presenter: ~Sazan_Mahbub1
Submission Number: 36
Loading