Density-aware and Depth-aware Visual Representation for Zero-Shot Object Counting

Published: 2025, Last Modified: 15 Jan 2026ICASSP 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Previous methods often utilize CLIP semantic classifiers with class names for zero-shot object counting. However, they ignore crucial density and depth knowledge for counting tasks. Thus, we propose a density-aware and depth-aware prompt counting model, which captures density information via learning density-aware prompts based on density-aware contrastive loss and incorporates depth guidance with predefined depth-aware prompts. To facilitate the training process, we design two strategies for standard counting loss and the contrastive loss, where the former prioritizes larger and sparser objects initially, gradually focusing on smaller and denser objects, and the latter adopts coarse-to-fine density learning. Besides, we construct a dataset named LVIS-372 with more real-world scenarios and balanced instance distribution compared to existing ones. Finally, the experimental results demonstrate the effectiveness of our proposed method.
Loading