DivShift: Exploring Domain-Specific Distribution Shifts in Large-Scale, Volunteer-Collected Biodiversity Datasets

Published: 11 Apr 2025, Last Modified: 01 Apr 2026Proceedings of the AAAI Conference on Artificial IntelligenceEveryoneCC BY 4.0
Abstract: Large-scale, volunteer-collected datasets of community-identified natural world imagery like iNaturalist have enabled marked performance gains for fine-grained visual classification of plant species using deep learning models. However, such datasets are opportunistic and lack a structured sampling strategy. Resulting geographic, temporal, observation quality, and socioeconomic biases inherent to this volunteer-based participatory data collection process are stymieing the wide uptake of these models for downstream biodiversity monitoring tasks, especially in the Global South. While widely documented in biodiversity modeling literature, the impact of these biases' downstream distribution shift on deep learning models have not been rigorously quantified. Here we introduce Diversity Shift (DivShift), a framework for quantifying the effects of biodiversity domain-specific distribution shifts on deep learning model performance. We also introduce DivShift - North American West Coast (DivShift-NAWC), a new curated dataset of almost 8 million iNaturalist plant observations across the western coast of North America, for diagnosing the effects of these biases in a controlled case study. Using this new dataset, we contrast computer vision model performance across a variety of these shifts and observe that these biases indeed confound model performance across observation quality, spatial location, and political boundaries. Interestingly, we find for all partitions that accuracy is lower than expected by chance from estimates of dataset shift from the data themselves, implying the structure within natural world images provides significant generalization improvements. From these observations, we suggest recommendations training computer vision models on natural world imagery biodiversity collections.
Loading