Robust Video Perception by Seeing Motion

17 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: Action Recognition, Robustness
TL;DR: We show that enforcing motion consistency at test time can improve video perception models' robustness.
Abstract: Despite their excellent performance, state-of-the-art computer vision models often fail when they encounter shifted distributions or adversarial examples. We find existing video perception models fail because they are not able to perceive the correct motion. Inspired by the extensive evidence that motion is a key factor in the human visual system, we propose to correct what the model sees by restoring the perceived motion information. We create a test-time constraint using motion information without any human annotation, where this constraint should be respected by all robust video perception models. Our key observation is that this constraint is violated when the inputs are corrupted or adversarially attacked. By optimizing the input to respect the constraint at test time, we can adapt the inference to be robust. Visualizations and empirical experiments on UCF101 and HMDB-51 datasets show that restoring motion information in deep vision models improves robustness under both common corruptions and worst-case perturbations
Supplementary Material: zip
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 820
Loading