Context-Aware Data Augmentation for Efficient Object Detection by UAV Surveillance

Published: 01 Jan 2022, Last Modified: 06 Mar 2025ISDFS 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The problem of object detection by YOLOv4 deep neural network (DNN) is considered on Stanford drone dataset (SDD) with object classes (pedestrians, bicyclists, cars, skateboarders, golf carts, and buses) collected by Unmanned Aerial Vehicle (UAV) video surveillance. Some frames (images) with labels were extracted from videos of this dataset and structured in the open-access SDD frames (SDDF) version (https://www.kaggle.com/yoctoman/stanford-drone-dataset-frames). The context-aware data augmentation (CADA) was proposed to change bounding box (BB) sizes by some percentage of its width and height. To investigate the possible effect of the dataset labeling quality the "dirty" and "clean" dataset versions were prepared, which differ by the evaluation subset only. CADA procedures lead to significant improvement of performance by loss and mean average precision (mAP) that can be observed both for "dirty" and "clean" evaluation subsets in comparison to experiments without CADA. Moreover, CADA procedures allow to get the mAP values on the "dirty" (real) evaluation subset that can be similar (and for some classes higher even) to the mAP values on the "clean" (ground-truth - GT) evaluation subset without CADA procedures. This effect can be explained by increase of signal-to-noise ratios for object-to-background pairs after IN-like cropping CADA procedures and then by increase of variability of object-to-background pair after subsequent OUT-like enlarging CADA procedures. It should be noted the non-commutative nature of CADA-based retraining procedures because their reverse direction like first-OUT-then-IN CADA in contrast to first-IN-then-OUT CADA did not lead to such a big increase of mAP values. Several CADA-sequences were analyzed and the best strategy consists in first-IN-then-OUT CADA procedures, where the extent of decrease and increase of BBs width and height can be different for various applications and datasets.
Loading