Learning from student's mistakes: Improving mean teacher for end-to-end semi-supervised video action detectionDownload PDF

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone
Keywords: semi-supervised, activity detection, student-teacher, video understanding
Abstract: In this work, we focus on semi-supervised learning for video action detection. We present Enhanced Mean Teacher, a simple end-to-end student-teacher based framework which rely on pseudo-labels to learn from unlabeled samples. Limited amount of data make the teacher prone to unreliable boundaries while detecting the spatio-temporal actions. We propose a novel auxiliary module, which learns from student’s mistakes on labeled samples and improve the spatio-temporal pseudo-labels generated by the teacher on unlabeled set. The proposed framework utilize spatial and temporal augmentations to generate pseudo-labels where both classification as well as spatio-temporal consistencies are used to train the model. We evaluate our approach on two action detection benchmark datasets, UCF101-24, and JHMDB-21. On UCF101-24, our approach outperforms the supervised baseline by an approximate margin of 19% on f-mAP@0.5 and 25% on v-mAP@0.5. Using merely 10-15% of the annotations in UCF-101-24, the proposed approach provides a competitive performance compared to the supervised baseline trained on 100% annotations. We also evaluate the effectiveness of Enhanced Mean Teacher for video object segmentation demonstrating its generalization capability to other tasks in video domain.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Applications (eg, speech processing, computer vision, NLP)
5 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview