Abstract: Drone-based crowd tracking faces difficulties in accurately identifying and monitoring objects from an aerial perspective, largely due to their small size and close proximity to each other, which complicates both localization and tracking. To address these challenges, we present the Density-aware Tracking (DenseTrack) framework. DenseTrack capitalizes on crowd counting to precisely determine object locations, blending visual and motion cues to improve the tracking of small-scale objects. It specifically addresses the problem of cross-frame motion to enhance tracking accuracy and dependability. DenseTrack employs crowd density estimates as anchors for exact object localization within video frames. These estimates are merged with motion and position information from the tracking network, with motion offsets serving as key tracking cues. Moreover, DenseTrack enhances the ability to distinguish small-scale objects using insights from the visual language model, integrating appearance with motion cues. The framework utilizes the Hungarian algorithm to ensure the accurate matching of individuals across frames. Demonstrated on DroneCrowd dataset, our approach exhibits superior performance, confirming its effectiveness in scenarios captured by drones.
Primary Subject Area: [Content] Vision and Language
Secondary Subject Area: [Content] Media Interpretation, [Experience] Multimedia Applications
Relevance To Conference: Our work on the DenseTrack is highly relevant to the fields of Multimodal and Multimedia. Multimodal: DenseTrack embodies the principle of multimodal learning, which combines information from different sources to enhance performance. Specifically, in drone-based multi-object tracking tasks, DenseTrack leverages insights from crowd counting techniques, appearance cues, motion cues, and visual-language models to accurately determine object positions and improve tracking accuracy. Moreover, DenseTrack's utilization of insights from visual-language models highlights its involvement in multimodal data processing. Multimedia: DenseTrack addresses the challenge of accurately identifying and monitoring objects in drone-based crowd tracking, which involves handling multimedia data from drone videos. In summary, DenseTrack embodies the principle of multimodal learning in crowd tracking and deals with multimedia data from drone videos, thus being relevant to both the research domains of Multimodal and Multimedia.
Supplementary Material: zip
Submission Number: 1254
Loading