Bringing Saccades and Fixations into Self-supervised Video Representation LearningDownload PDF

Published: 01 Feb 2023, Last Modified: 13 Feb 2023Submitted to ICLR 2023Readers: Everyone
Keywords: Self-supervised learning, video self-supervised learning, bio-inspired
TL;DR: In this paper, we propose a self-supervised video representation learning method by taking inspiration from cognitive science and neuroscience on human visual perceptionization.
Abstract: In this paper, we propose a self-supervised video representation learning (video SSL) method by taking inspiration from cognitive science and neuroscience on human visual perception. Different from previous methods that mainly start from the inherent properties of videos, we argue that humans learn to perceive the world through the self-awareness of the semantic change or consistency in the input stimuli in the absence of labels, accompanied by representation reorganization during the post-learning rest periods. To this end, we first exploit the presence of saccades as an indicator of semantic change in a contrastive learning framework to mimic the self-awareness in human representation learning, where the saccades are generated without eye-tracking data. Second, we model the semantic consistency by minimizing the prediction error between the predicted and the true state of another time point during a fixation. Third, we later incorporate prototypical contrastive learning to reorganize the learned representations such that perceptually similar representations would be associated closer. Compared to previous counterparts, our method can capture finer-grained semantics from video instances, and the associations among similar ones are further strengthened. Experiments show that the proposed bio-inspired video SSL method significantly improves the Top-1 video retrieval accuracy on UCF101 and achieves superior performance on downstream tasks such as action recognition under comparable settings.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Unsupervised and Self-supervised learning
Supplementary Material: zip
16 Replies

Loading