Keywords: Temporal Action Detection, Untrimmed Video Understanding, Efficient Detection
Abstract: Temporal action detection (TAD) often suffers from the pain of huge demand for computing resources due to long video duration. As a consequence, given limited resources, most action detectors can only operate on pre-extracted features rather than original video frames, resulting in sub-optimal solutions. In this work, we propose an efficient temporal action detector (ETAD) that can train directly from video frames, by introducing a novel sampling mechanism. First, for where to sample in TAD, we propose snippet-level sampling and proposal-level sampling, based on the observation that performance saturates at a small number of snippets/proposals. Such samplings essentially leverage the redundancy in the current detection framework, thus can substantially reduce the computation cost and enable end-to-end training for long untrimmed videos without harming the performance. Second, for how to sample in TAD, we comprehensively study various sampling approaches, and point out that the random sampling and DPP sampling work the best empirically. Our sampling-based ETAD achieves state-of-the-art performance on TAD benchmarks with remarkable efficiency. With end-to-end training, ETAD can reach 38.25% average mAP on ActivityNet-1.3. With pre-extracted features, ETAD only needs 6 mins of training time and 1.23 GB memory, still reaching average mAP 37.78%. Code will be available.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Applications (eg, speech processing, computer vision, NLP)
TL;DR: We novelly propose to alleviate the efficiency issue in TAD by the sampling mechanism. We detailed study two questions: where to sample and how to sample in TAD.
10 Replies
Loading