AgMMU: A Comprehensive Agricultural Multimodal Understanding Benchmark

Aruna Gauba; Irene Pi; Yunze Man; Ziqi Pang; Vikram S. Adve; Yu-Xiong Wang

AgMMU: A Comprehensive Agricultural Multimodal Understanding Benchmark

Aruna Gauba, Irene Pi, Yunze Man, Ziqi Pang, Vikram S. Adve, Yu-Xiong Wang

Published: 18 Sept 2025, Last Modified: 30 Oct 2025NeurIPS 2025 Datasets and Benchmarks Track posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Vision-Language Model, Large Vision Language Model, Benchmark

TL;DR: AgMMU is a challenging real‑world benchmark for evaluating and advancing vision-language models (VLMs) in the knowledge‑intensive domain of agriculture.

Abstract: We present **AgMMU**, a challenging real‑world benchmark for evaluating and advancing vision-language models (VLMs) in the knowledge‑intensive domain of agriculture. Unlike prior datasets that rely on crowdsourced prompts, AgMMU is distilled from 116,231 authentic dialogues between everyday growers and USDA-authorized Cooperative Extension experts. Through a three‑stage pipeline: automated knowledge extraction, QA generation, and human verification, we construct (i) AgMMU, an evaluation set of 746 multiple‑choice questions (MCQs) and 746 open‑ended questions (OEQs), and (ii) AgBase, a development corpus of 57,079 multimodal facts covering five high-stakes agricultural topics: insect identification, species identification, disease categorization, symptom description, and management instruction. AgMMU has three key advantages: - **Authentic \& Expert‑Verified**: All facts, images, and answers originate from real farmer and gardener inquiries answered by credentialed specialists, ensuring high‑fidelity agricultural knowledge. - **Complete Development Suite**: AgMMU uniquely couples a dual‑format evaluation benchmark (MCQ and OEQ) with AgBase, a large‑scale training set, enabling both rigorous assessment and targeted improvement of VLMs. - **Knowledge‑intensive Challenge**: Our tasks demand the synergy of nuanced visual perception and domain expertise, exposing fundamental limitations of current general‑purpose models and charting a path toward robust, application‑ready agricultural AI. Benchmarking 12 leading VLMs reveals pronounced gaps in fine‑grained perception and factual grounding. Open‑sourced models trail after proprietary ones by a wide margin. Simple fine‑tuning on AgBase boosts open-sourced model performance on challenging OEQs for up to 11.6\% on average, narrowing this gap and also motivating future research to propose better strategies in knowledge extraction and distillation from AgBase. We hope AgMMU stimulates research on domain‑specific knowledge integration and trustworthy decision support in agriculture AI development.

Croissant File: json

Dataset URL: https://huggingface.co/datasets/AgMMU/AgMMU_v1

Code URL: https://github.com/AgMMU/AgMMU

Supplementary Material: pdf

Primary Area: AL/ML Datasets & Benchmarks for life sciences (e.g. climate, health, life sciences, physics, social sciences)

Submission Number: 1067

Loading