Composing Protein Evidence: A Benchmark for Fine-Grained Protein-Text Understanding

ACL ARR 2026 May Submission15437 Authors

26 May 2026 (modified: 13 Jun 2026)ACL ARR 2026 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Protein Understanding, Multimodal LLM, Benchmark, Fine-grained
Abstract: Protein-text models should not only recognize what a protein does, but also connect features that support reasoning. Existing evaluations often score final labels, local features, or protein-text matches separately, leaving unclear whether a model could predict composed features. We introduce PannotGround, a benchmark for systematic global-local protein evidence evaluation. PannotGround projects curated protein records into three aspects, then composes them into protein-to-text and text-to-protein tasks with biologically structured hard negatives. This design goes beyond plausible protein-text matching by requiring compositional features to be jointly consistent with the queried protein. Across protein language models, protein-text alignment models, protein-LLMs, and text LLMs, we find that single-level performance does not reliably transfer to compositional binding. Alignment models are strongest, but remain sensitive to local-evidence contrasts, and many errors are biologically related near misses rather than random failures. PannotGround provides a diagnostic benchmark for fine-grained protein-text understanding and exposes global-local evidence binding as a central bottleneck for current models.
Paper Type: Long
Research Area: Clinical and Biomedical Applications
Research Area Keywords: Protein Understanding, Multimodal, LLM, Biology
Languages Studied: Protein language, natural language(English)
Previous URL: https://openreview.net/forum?id=frnbqlEvLL
Explanation Of Revisions PDF: pdf
Reassignment Request Area Chair: Yes, I want a different area chair for our submission
Reassignment Request Reviewers: Yes, I want a different set of reviewers
Visa Needs: yes
A1 Limitations Section: This paper has a limitations section.
A2 Potential Risks: Yes
A2 Elaboration: In statement
B Use Or Create Scientific Artifacts: Yes
B1 Cite Creators Of Artifacts: Yes
B1 Elaboration: All used artifacts are sited in section 1-2
B2 Discuss The License For Artifacts: Yes
B2 Elaboration: At the end of ethics
B3 Artifact Use Consistent With Intended Use: Yes
B3 Elaboration: All artifact use are based on research need and their public intended use
B4 Data Contains Personally Identifying Info Or Offensive Content: No
B4 Elaboration: No personal information
B5 Documentation Of Artifacts: No
B5 Elaboration: Not related
B6 Statistics For Data: Yes
B6 Elaboration: Mainly in appendix
C Computational Experiments: Yes
C1 Model Size And Budget: Yes
C1 Elaboration: In section 4-5 and corresponding appendix
C2 Experimental Setup And Hyperparameters: Yes
C2 Elaboration: In section 4-5 and appendix
C3 Descriptive Statistics: Yes
C3 Elaboration: In section 3-5
C4 Parameters For Packages: Yes
C4 Elaboration: In section 3-5 and appendix
D Human Subjects Including Annotators: No
D1 Instructions Given To Participants: N/A
D2 Recruitment And Payment: No
D2 Elaboration: No recruitment
D3 Data Consent: N/A
D4 Ethics Review Board Approval: Yes
D4 Elaboration: In Impact/Ethics Statement
E Ai Assistants In Research Or Writing: Yes
E1 Information About Use Of Ai Assistants: Yes
Author Submission Checklist: yes
EMNLP 2026 AI Reviewing Experiment: yes
Submission Number: 15437
Loading