PaddyVLM: An Expert-tuned Vision-Language Model for Paddy Disease Diagnosis.

Published: 09 Dec 2025, Last Modified: 25 Jan 2026AgriAI 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: paddy_disease_detection, pest_identification, crop_health_monitoring, rice_pathology_analysis, agricultural_diagnostics, plant_stress_detection, precision_agriculture, agricultural_decision_support, multimodal_AI, vision_language_models, domain_adapted_VLMs, instruction_tuning, expert_tuned_AI, fine_grained_recognition, multimodal_reasoning, zero_shot_inference
TL;DR: PaddyVLM is a domain-tuned vision-language model that uses expert-curated multimodal data to accurately identify paddy diseases and pests and provide actionable guidance, outperforming general LMMs in agricultural diagnosis.
Abstract: Large multimodal models (LMMs) excel at general visionlanguage reasoning but often underperform in agriculture, where disease and pest diagnosis demands fine-grained, domain-specific understanding. We present PaddyVLM, a domain-adapted vision-language model for paddy crop analysis, capable of identifying diseases and pests, assessing severity, and providing actionable guidance. Built on LLaVA-v1.5-7B-LoRA, our model is trained using PaddyInstruct, a curated instruction-tuning dataset derived from the Paddy Doctor (10,407 images, 10 classes) and Paddy Pest (5,673 images, 20 classes) datasets, annotated and verified by agronomists. PaddyInstruct combines LLaVA-13B–generated descriptions, Mistral-7B–generated Q&A and multi-turn dialogues, and expert knowledge refinement. Fine-tuning on this dataset equips PaddyVLM with robust fine-grained recognition and context-aware reasoning. Experiments show that PaddyVLM substantially outperforms general-purpose LMMs in both disease and pest understanding, demonstrating its potential as a practical expert assistant for farmers and agricultural researchers. All code, datasets, and trained models are available at https://anonymous.4open. science/r/paddy-vlm-7A67/.
Submission Number: 23
Loading