AgriPerceiver: A Parameter-Efficient Vision-Language Model for Structured Macroscopic Crop Phenotyping

Vatsal Khanna; Davinder Singh

AgriPerceiver: A Parameter-Efficient Vision-Language Model for Structured Macroscopic Crop Phenotyping

Vatsal Khanna, Davinder Singh

Published: 28 May 2026, Last Modified: 11 Jun 2026ICML 2026 FM4LS Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Vision-Language Models, Perceiver Resampler, Parameter-Efficient Fine-Tuning, Agricultural Pathology, Crop Phenotyping, Multimodal Grounding, Visual Token Compression, Scalable AI for Life Sciences

TL;DR: A Parameter-Efficient Vision-Language Model for Structured Macroscopic Crop Phenotyping

Abstract: Modern agriculture suffers from systemic crop yield instability, with plant pathogens contributing to 20-40\% of global yield losses. Adapting generalist Vision-Language Models (VLMs) to produce structured, actionable phenotyping from crop images, creates a scalability crisis at the intersection of computer vision and agricultural life sciences. We present AgriPerceiver, a lightweight VLM that frames this challenge as structured report generation: given a single leaf photograph, the model produces a schema-compliant JSON diagnostic report detailing disease identity, pathology type, severity score, symptom characterisation, and actionable steps. To process high-resolution visual imagery, the input is spatially decomposed to preserve fine-grained pathological features that are typically lost during standard resizing. Our central contribution is a perception bridge that mitigates visual token explosion by compressing visual tokens into just 128 learned latents ($28.5\times$ reduction) while critically preserving learned tile-position embeddings for spatial grounding. We employ a two-stage training curriculum: the bridge (${{\sim}}391$M parameters) is first aligned, followed by LoRA specialization on Gemma-3-labelled structured annotations. Both the vision and language backbones remain strictly frozen throughout training to maintain parameter efficiency. Evaluated across nine metrics, AgriPerceiver achieves a composite score of 0.810 and 99.7\% schema compliance on a held-out test set, demonstrating the viability of parameter-efficient domain specialization in life sciences AI for structured knowledge extraction.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 57

Loading