Benchmarking the Robustness of CNN-based Spatial-Temporal ModelsDownload PDF

06 Jun 2021 (modified: 24 May 2023)Submitted to NeurIPS 2021 Datasets and Benchmarks Track (Round 1)Readers: Everyone
Abstract: The state-of-the-art deep convolutional neural networks are vulnerable to common corruptions in nature (e.g., input data corruptions caused by weather changes, system errors). While rapid progress has been made in analyzing and improving the robustness of models in image understanding, the robustness in video understanding is largely ignored. In this paper, we establish a corruption robustness benchmark, Mini Kinetics-C and Mini SSV2-C, which considers temporal corruptions beyond spatial corruptions in images. We make the first attempt to conduct an exhaustive study on corruption robustness in terms of spatial and temporal domain, using established CNN-based spatial-temporal models. The study provides some guidance on robust model design, training and inference: 1) 3D modules make video classification models more robust instead of 2D modules, 2) longer input length and uniform sampling of input frames can benefit model corruption robustness, 3) model corruption robustness (especially robustness in the temporal domain) enhances with computational cost, which may contradict with the current trend of improving the computational efficiency of models. Our codes are available on https://github.com/Newbeeyoung/Video-Corruption-Robustness.
Supplementary Material: zip
URL: https://github.com/Newbeeyoung/Video-Corruption-Robustness
4 Replies

Loading