Test-Time Training on Video StreamsDownload PDF

22 Sept 2022 (modified: 12 Mar 2024)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone
Abstract: We investigate visual generalization video streams instead of independent images, since the former is closer to the smoothly changing environments where natural agents operate. Traditionally, single-image models are tested on videos as collections of unordered frames. We instead test on each video in temporal order, making a prediction on the current frame before the next arrives, after training at test time on frames from the recent past. To perform test-time training without ground truth labels, we leverage recent advances in masked autoencoders for self-supervision. We improve performance on various real-world applications. We also discover that forgetting can be beneficial for test-time training, in contrast to the common belief in the continual learning community that it is harmful.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Unsupervised and Self-supervised learning
Supplementary Material: zip
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2307.05014/code)
5 Replies

Loading