Abstract: Highlights•Trainable temporal module atop image model effectively integrates motion sementic.•Video self-supervised learning combined with image model has great potential.•AdViSe framework greatly reduces training cost of video self-supervised learning.
Loading