- Keywords: deep learning, robustness, adversarial, imagenet
- TL;DR: We introduce a systematic framework for quantifying the robustness of classifiers to naturally occurring perturbations of images found in videos.
- Abstract: We introduce a systematic framework for quantifying the robustness of classifiers to naturally occurring perturbations of images found in videos. As part of this framework, we construct ImageNet-Vid-Robust, a human-expert--reviewed dataset of 22,668 images grouped into 1,145 sets of perceptually similar images derived from frames in the ImageNet Video Object Detection dataset. We evaluate a diverse array of classifiers trained on ImageNet, including models trained for robustness, and show a median classification accuracy drop of 16\%. Additionally, we evaluate the Faster R-CNN and R-FCN models for detection, and show that natural perturbations induce both classification as well as localization errors, leading to a median drop in detection mAP of 14 points. Our analysis shows that natural perturbations in the real world are heavily problematic for current CNNs, posing a significant challenge to their deployment in safety-critical environments that require reliable, low-latency predictions.