MicrobeQuest: A Multimodal Benchmark for Information Retrieval in Microbiology

ACL ARR 2025 May Submission4918 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: AI for Science (AI4S) is reshaping research paradigms across numerous disciplines. In microbiology, multimodal data (text, images, table and chart) exist in scientific literature and public databases to understand complex relationship between microbial strains and their environment. However, current benchmarks are either general-purpose or designed for disciplines such as material or biomedical sciences, lacking one specific for microbial sciences. Here, we developed MicrobeQuest, the first comprehensive, multimodal benchmark with 10,176 Question-Answer (QA) pairs for microbiology-specific information retrieval to take advantage of the vast amount of available information in microbiology. We first developed an expert-in-the-loop platform (MicrobeCollect) to acquire and annotation microbiological data. We then demonstrated its utility by benchmarking 17 state-of-the-art (SOTA) information retrieval (IR) methods. This yielded crucial performance insights and established a robust foundation for future IR advancements in microbiology. All benchmark resources, including code and datasets, are publicly available at https://github.com/acl-submission/MicrobeQuest.
Paper Type: Long
Research Area: Special Theme (conference specific)
Research Area Keywords: Information Extraction,Information Retrieval and Text Mining,Resources and Evaluation
Contribution Types: Data resources
Languages Studied: English
Submission Number: 4918
Loading