ProtQueSt: Query-Conditioned Retrieval-Augmented Generation for Protein Function Annotation

Linrui Ma; Yiwei Liang; Yishu Yu; Chuhan Joyce Qi

ProtQueSt: Query-Conditioned Retrieval-Augmented Generation for Protein Function Annotation

Linrui Ma, Yiwei Liang, Yishu Yu, Chuhan Joyce Qi

Published: 28 May 2026, Last Modified: 28 May 2026GenBio 2026 SpotlightEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Protein function prediction, query conditioning, contrastive learning, retrieval-augmented generation, protein–text alignment, protein language models

TL;DR: ProtQueSt retrieves protein function annotations through a query-conditioned contrastive retriever with FiLM modulation, learning to surface annotations relevant to specific biological questions.

Abstract: Protein function prediction from sequence is inherently query-dependent: the same protein may be characterized by its catalytic activity, domain architecture, or cellular localization depending on the biological question. Prior work has shown that large language models consistently underperform retrieval-based methods on this task, yet simple retrieval transfers annotations from embedding-space neighbors without adapting to the query. We introduce ProtQueSt, a retrieval-augmented framework that pairs a structure-aware retriever with a query-conditioned contrastive retriever that aligns protein and annotation representations via Feature-wise Linear Modulation (FiLM). FiLM conditioning and query-pooled negative sampling prove jointly essential, as neither alone improves over the structural baseline. ProtQueSt achieves the highest Entity-BLEU (48.79, +37% over RAPM) and LLM-as-a-judge score reported on Prot-Inst-OOD. This result supports reframing text-based protein understanding as a query-conditioned retrieval problem rather than a single fixed sequence-to-text mapping.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 185

Loading