CLIPping Imbalances: A Novel Evaluation Baseline and PEARL Dataset for Pedestrian Attribute Recognition

Kamalakar Vijay Thakare, Debi Dogra, Heeseung Choi, Kamakshya Prasad Nayak, Hyungjoo Jung, Ig-Jae Kim, Lalit Lohani

Published: 27 Feb 2025, Last Modified: 05 Mar 2025WACV 2025EveryoneCC BY 4.0

Abstract: Pedestrian Attribute Recognition (PAR) serves as a fundamental task in computer vision and is crucial for upgrading security systems. It helps in precisely identifying and characterizing various attributes of pedestrians. However, current PAR datasets have certain issues in representing a wide range of attributes correctly, which makes the existing PAR methods less effective in real-world scenarios. Addressing this limitation, this paper introduces PEARL, a comprehensive dataset comprising of diverse pedestrian images annotated with 146 attributes. These samples have been sourced from surveillance videos across twelve countries. This paper also formulates an image-based PAR using language-image fusion strategy and utilizes CLIP as a new evaluation baseline. Specifically, we leverage textual information by transforming sets of attributes into meaningful sentences. Addressing the inherent data imbalance in PAR, we provide three types of prompt settings to optimize the training of the CLIP model. Our evaluation encompasses a thorough assessment of the proposed baseline model across various datasets, including PEARL dataset as well as established PAR benchmarks such as PA100K, RAP, and PETA.