Assisted Few-Shot Learning for Vision-Language Models in Agricultural Stress Phenotype Identification
Keywords: Few-Shot Learning, In-Context Learning, Large Language Models, Vision Language Models, Agriculture
TL;DR: Assisted Few-Shot Learning boosts vision-language models for agricultural stress identification. It uses similarity-based retrieval to select relevant examples, improving performance on 6/7 agricultural tasks with limited data.
Abstract: In the agricultural sector, labeled data for crop diseases and stresses are often scarce due to high annotation costs. We propose an Assisted Few-Shot Learning approach to enhance vision-language models (VLMs) for image classification tasks with limited annotated data by optimizing the selection of input examples. Our method employs one image encoder at a time—Vision Transformer (ViT), ResNet-50, or CLIP—to retrieve contextually similar examples using cosine similarity of embeddings, thereby providing relevant few-shot prompts to VLMs. We evaluate our approach on the agricultural benchmark for VLMs, focusing on stress phenotyping, where proposed method improves performance in 6 out of 7 tasks. Experimental results demonstrate that, using the ViT encoder, the average F1 score across seven agricultural classification tasks increased from 68.68\% to 80.45\%, highlighting the effectiveness of our method in improving model performance with limited data.
Submission Number: 137
Loading