Domain-Agnostic Crowd Counting via Uncertainty-Guided Style Diversity Augmentation

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Domain shift poses a significant barrier to the performance of crowd counting algorithms in unseen domains. While domain adaptation methods address this challenge by utilizing images from the target domain, they become impractical when target domain images acquisition is problematic. Additionally, these methods require extra training time due to the need for fine-tuning on target domain images. To tackle this problem, we propose an Uncertainty-Guided Style Diversity Augmentation (UGSDA) method, enabling the crowd counting models to be trained solely on the source domain and directly generalized to different unseen target domains. It is achieved by generating sufficiently diverse and realistic samples during the training process. Specifically, our UGSDA method incorporates three tailor-designed components: the Global Styling Elements Extraction (GSEE) module, the Local Uncertainty Perturbations (LUP) module, and the Density Distribution Consistency (DDC) loss. The GSEE extracts global style elements from the feature space of the whole source domain. The LUP aims to obtain uncertainty perturbations from the batch-level input to form style distributions beyond the source domain, which used to generate diversified stylized samples together with global style elements. To regulate the extent of perturbations, the DDC loss imposes constraints between the source samples and the stylized samples, ensuring the stylized samples maintain a higher degree of realism and reliability. Comprehensive experiments validate the superiority of our approach, demonstrating its strong generalization capabilities across various datasets and models. Our code will be made publicly available.
Primary Subject Area: [Content] Vision and Language
Secondary Subject Area: [Experience] Multimedia Applications
Relevance To Conference: Our study introduces the Uncertainty-Guided Style Diversity Augmentation (UGSDA) method for domain-agnostic multimedia understanding, significantly advancing multimedia and multimodal processing by enabling models to generalize across unseen domains without target domain data. This innovation addresses the pervasive challenge of domain variation in multimedia, where processing and analyzing images across diverse environments is crucial. Specifically, we employ novel techniques like the Global Styling Elements Extraction (GSEE) and Local Uncertainty Perturbations (LUP), alongside the Density Distribution Consistency (DDC) loss, to produce diversified and realistic training samples. These advancements allow models to adapt effectively to new domains, enhancing the versatility and applicability of multimedia systems. The core contribution of our work is the development of domain-agnostic algorithms, which operate across various contexts and modalities without domain-specific training data, broadening the scope of multimedia applications. This is especially valuable in settings where acquiring annotated data is difficult, thus expanding the potential for multimedia applications in varied scenarios like crowd counting. By addressing domain shift and fostering more adaptable, generalizable models, our research moves multimedia processing forward, opening new paths for application in scenarios characterized by data diversity and domain variance.
Submission Number: 3348
Loading