Dense and Indiscernible Object Counting in Agricultural Scenes

ICLR 2026 Conference Submission19736 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Object Counting;Agricultural Counting; Vision Language Model; Few shot;
Abstract: Object counting in computer vision has traditionally focused on clearly visible objects. Many real-world applications, such as crop yield estimation and fruit harvest planning in agricultural, involve dense and indiscernible object counting (DIOC). These objects are characterized by their small size, dense distribution, and visual ambiguity with surroundings, which makes traditional counting methods impractical. To facilitate research in this crucial yet unexplored challenge, we introduce DIOCblueberry, a specialized dataset that significantly surpasses existing datasets in complexity. Compared to FSC147, the most comprehensive general counting dataset, DIOCblueberry contains 1.9 times more objects per image with an average of 108 instances, while its box pixel ratio of 2.38‰ is 7.9 times smaller. State-of-the-art counting methods struggle significantly on such challenging scenarios, with high counting errors. To address these challenges, we propose MaskCount, a two-stage multi-modal method. The first stage segments objects from complex backgrounds using multi-modal features, while the second stage enhances feature robustness through contrastive loss. We also design an edge-aware patch cropping mechanism for accurate counting of dense and small objects. Extensive experiments demonstrate that MaskCount achieves substantial improvements over previous state-of-the-art methods, reducing MAE and RMSE by 25.13% and 35.17% respectively on DIOCblueberry. We will release our data, models, and code to the public.
Primary Area: datasets and benchmarks
Submission Number: 19736
Loading