Space Time Recurrent Memory NetworkDownload PDF

29 Sept 2021 (modified: 13 Feb 2023)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone
Abstract: Transformers have recently been popular for learning and inference in the spatial-temporal domain. However, their performance relies on storing and applying attention to the feature tensor of each frame. Hence, their space and time complexity increase linearly as the sequence length grows, which could be very costly for long videos. We propose a novel visual memory network architecture for the learning and inference problem in the spatial-temporal domain. We maintain a fixed set of memory slots in our memory network and explore different designs to input new information into the memory, combine the information in different memory slots and decide how to discard old information. Finally, this architecture is benchmarked on the video object segmentation and video prediction problems. Through the experiments, we show that our memory architecture can achieve competitive results compared to state-of-the-art transformer-based methods while maintaining constant memory capacity independent of sequence length.
One-sentence Summary: This paper introduces a new spatial-temporal memory serving multiple purposes.
Supplementary Material: zip
5 Replies

Loading