Language-Guided Zero-Shot Object Counting

Mingjie Wang, Song Yuan, Zhuohang Li, Longlong Zhu, Eric Buys, Minglun Gong

Published: 2024, Last Modified: 18 Oct 2025ICME Workshops 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Recently, the Class-Agnostic Counting (CAC) problem has garnered increasing attention owing to its intriguing generality and superior efficiency compared to Category-Specific Counting (CSC). This paper proposes a novel ExpressCount to enhance zero-shot object counting via deep language-guided learning. Specifically, the ExpressCount is comprised of an innovative Language-oriented Exemplar Perceptron and a downstream visual Zero-shot Counting pipeline. There-into, the perceptron hammers at exploiting exemplar cues from language-vision signals by inheriting rich semantic priors from the pre-trained Large Language Models (LLMs), whereas the counting pipeline excels in mining fine-grained features through dual-branch and cross-attention schemes, contributing to the high-quality similarity learning. Apart from building a bridge between the LLM in vogue and the visual counting tasks, expression-guided exemplar estimation advances zero-shot learning capabilities for counting instances with arbitrary classes. Moreover, devising a FSC-147-Express with annotations of meticulous linguistic expressions pioneers a new venue for developing and validating language-based counting models. Extensive experiments demonstrate the state-of-the-art performance of our ExpressCount, even showcasing the accuracy on par with partial CSC models.