CrowdNeXt: Boosting Weakly Supervised Crowd Counting With Dual-Path Feature Aggregation and a Robust Loss Function

Published: 2025, Last Modified: 12 Feb 2026IEEE Trans. Instrum. Meas. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Crowd counting has been a popular research topic due to its broad applicability, such as safety monitoring, urban planning, and disaster management. The crowd-counting task aims to accurately estimate the number of people in a dynamic video sequence or a static image. Timely and accurate estimation of the crowd is crucial for public safety and monitoring. Recent focus in crowd counting is on developing deep learning-based models, such as convolutional neural networks (CNNs) and vision transformers (ViTs). In addition, most existing crowd-counting methods require point-level annotation of each person in the scene (ground truth) to train the model. This annotation process is laborious and susceptible to errors. Due to this, there has been a shift in focus toward developing weakly supervised methods that require only the total person count in the image as ground truth. This work proposes a new pipeline for weakly supervised crowd counting and explores the utility of a robust mean absolute percentage error (MAPE) loss function in crowd counting. Performance evaluations on widely used datasets validate the effectiveness of the proposed method. Its performance is on par with the fully supervised crowd-counting methods and significantly better than the weakly supervised approaches.
Loading