Frame Adaptive NetworkDownload PDF

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone
Keywords: Video Recognition, Temporal Deviation
TL;DR: We propose a framework to train video recognition methods which can be evaluated at multiple frames and exhibit better performance compared to individual ones.
Abstract: Existing video recognition algorithms always conduct different training pipelines for inputs with different frame numbers, which requires repetitive training operations and multiplying storage costs. If we evaluate the model using other frame numbers which are not used in training, our observation, named Temporal Deviation, shows the performance will drop significantly (see Fig.1). Thus, the common training protocol for video related tasks is relatively rigid for flexible inference using various testing frames, especially for some edge devices with limited available frames or computational resources. In this study, we propose Frame Adaptive Network (FAN) to conduct a one-shot training but enable the model can be evaluated on different frame numbers. Concretely, FAN integrates several sets of training sequences, involves Specialized Normalization and Weight Alteration to efficiently expand the original network, and leverages Mutual Distillation for optimization. Comprehensive empirical validations using various architectures and popular benchmarks solidly demonstrate the effectiveness and generalization of FAN (e.g., 3.50/5.76/2.38$\%$ performance gain at frame 4/8/16 on Something-Something V1 dataset over competing method Uniformer), which also promises the practical potential of model usage.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning
5 Replies
