Self-supervised Temporal LearningDownload PDF

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Withdrawn SubmissionReaders: Everyone
Abstract: Self-supervised learning (SSL) has shown its powerful ability in discriminative representations for various visual, audio, and video applications. However, most recent works still focus on the different paradigms of spatial-level SSL on video representations. How to self-supervised learn the inherent representation on the temporal dimension is still unrevealed. In this work we propose self-supervised temporal learning (SSTL), aiming at learning spatial-temporal-invariance. Inspired by spatial-based contrastive SSL, we show that significant improvement can be achieved by a proposed temporal-based contrastive learning approach, which includes three novel and efficient modules: temporal augmentations, temporal memory bank and SSTL loss. The temporal augmentations include three operators -- temporal crop, temporal dropout, and temporal jitter. Besides the contrastive paradigm, we observe the temporal contents vary between each layer of the temporal pyramid. The SSTL extends the upper-bound of the current SSL approaches by $\sim$6% on the famous video classification tasks and surprisingly improves the current state-of-the-art approaches by $\sim$100% on some famous video retrieval tasks. The code of SSTL is released with this draft, hoping to nourish the progress of the booming self-supervised learning community.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Reviewed Version (pdf): https://openreview.net/references/pdf?id=gGx8WUEQjj
5 Replies

Loading