InsectAgent: Improving Insect Recognition through Dynamic Information Augmentation with Multimodal Large Language Models
Abstract: Insect recognition remains a critical challenge for biodiversity monitoring, conservation efforts, and agricultural sustainability. Current computer vision approaches struggle with accurate species identification due to subtle morphological differences. Our analysis reveals that while vision classifiers frequently fail to predict the correct species as their top choice, they consistently include the true species within top candidate predictions. This indicates that expert entomological knowledge is required to resolve ambiguities when vision classifiers fail. We present InsectAgent, a novel two-stage framework that enhances insect recognition through dynamic information augmentation using Multimodal Large Language Models (MLLMs). In the first stage, a vision classifier generates candidate species predictions with confidence scores. When confidence falls below a threshold, the second stage activates, retrieving relevant taxonomic knowledge from an expert knowledge base and invoking an MLLM for further analysis. This conditional MLLM invocation strategy significantly reduces computational costs by avoiding expensive model calls for high-confidence predictions while ensuring expertlevel reasoning for ambiguous cases. The information-augmented reasoning process combines visual cues with domain expertise, mirroring expert entomologists’ workflow. Experimental results demonstrate that InsectAgent significantly outperforms standalone vision classifiers, achieving an average relative improvement of 14.24% in accuracy for insect identification tasks.
Loading