Composing Protein Evidence: A Benchmark for Fine-Grained Protein-Text Understanding

Composing Protein Evidence: A Benchmark for Fine-Grained Protein-Text Understanding

ACL ARR 2026 May Submission15437 Authors

26 May 2026 (modified: 13 Jun 2026)ACL ARR 2026 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Protein Understanding, Multimodal LLM, Benchmark, Fine-grained

Abstract: Protein-text models should not only recognize what a protein does, but also connect features that support reasoning. Existing evaluations often score final labels, local features, or protein-text matches separately, leaving unclear whether a model could predict composed features. We introduce PannotGround, a benchmark for systematic global-local protein evidence evaluation. PannotGround projects curated protein records into three aspects, then composes them into protein-to-text and text-to-protein tasks with biologically structured hard negatives. This design goes beyond plausible protein-text matching by requiring compositional features to be jointly consistent with the queried protein. Across protein language models, protein-text alignment models, protein-LLMs, and text LLMs, we find that single-level performance does not reliably transfer to compositional binding. Alignment models are strongest, but remain sensitive to local-evidence contrasts, and many errors are biologically related near misses rather than random failures. PannotGround provides a diagnostic benchmark for fine-grained protein-text understanding and exposes global-local evidence binding as a central bottleneck for current models.

Paper Type: Long

Research Area: Clinical and Biomedical Applications

Research Area Keywords: Protein Understanding, Multimodal, LLM, Biology

Languages Studied: Protein language, natural language(English)

Previous URL: https://openreview.net/forum?id=frnbqlEvLL

Explanation Of Revisions PDF: pdf

Reassignment Request Area Chair: Yes, I want a different area chair for our submission

Reassignment Request Reviewers: Yes, I want a different set of reviewers

Visa Needs: yes

A1 Limitations Section: This paper has a limitations section.

A2 Potential Risks: Yes

A2 Elaboration: In statement

B Use Or Create Scientific Artifacts: Yes

B1 Cite Creators Of Artifacts: Yes

B1 Elaboration: All used artifacts are sited in section 1-2

B2 Discuss The License For Artifacts: Yes

B2 Elaboration: At the end of ethics

B3 Artifact Use Consistent With Intended Use: Yes

B3 Elaboration: All artifact use are based on research need and their public intended use

B4 Data Contains Personally Identifying Info Or Offensive Content: No

B4 Elaboration: No personal information

B5 Documentation Of Artifacts: No

B5 Elaboration: Not related

B6 Statistics For Data: Yes

B6 Elaboration: Mainly in appendix

C Computational Experiments: Yes

C1 Model Size And Budget: Yes

C1 Elaboration: In section 4-5 and corresponding appendix

C2 Experimental Setup And Hyperparameters: Yes

C2 Elaboration: In section 4-5 and appendix

C3 Descriptive Statistics: Yes

C3 Elaboration: In section 3-5

C4 Parameters For Packages: Yes

C4 Elaboration: In section 3-5 and appendix

D Human Subjects Including Annotators: No

D1 Instructions Given To Participants: N/A

D2 Recruitment And Payment: No

D2 Elaboration: No recruitment

D3 Data Consent: N/A

D4 Ethics Review Board Approval: Yes

D4 Elaboration: In Impact/Ethics Statement

E Ai Assistants In Research Or Writing: Yes

E1 Information About Use Of Ai Assistants: Yes

Author Submission Checklist: yes

EMNLP 2026 AI Reviewing Experiment: yes

Submission Number: 15437

Loading