RMem: Restricted Memory Banks Improve Video Object Segmentation

Published: 01 Jan 2024, Last Modified: 10 Nov 2024CVPR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: With recent video object segmentation (VOS) benchmarks evolving to challenging scenarios, we revisit a sim-ple but overlooked strategy: restricting the size of memory banks. This diverges from the prevalent practice of ex-panding memory banks to accommodate extensive histor-ical information. Our specially designed “memory deci-phering” study offers a pivotal insight underpinning such a strategy: expanding memory banks, while seemingly bene-ficial, actually increases the difficulty for VOS modules to decode relevant features due to the confusion from redun-dant information. By restricting memory banks to a limited number of essential frames, we achieve a notable improvement in VOS accuracy. This process balances the im-portance and freshness of frames to maintain an informative memory bank within a bounded capacity. Additionally, restricted memory banks reduce the training-inference discrepancy in memory lengths compared with continuous expansion. This fosters new opportunities in temporal reasoning and enables us to introduce the previously overlooked “temporal positional embedding.” Finally, our insights are embodied in “RMem” (“R” for restricted), a simple yet effective VOS modification that excels at challenging VOS scenarios and establishes new state of the art for object state changes (on the VOST dataset) and long videos (on the Long Videos dataset). Our code and demos are available at https://restricted-memory.github.io/.
Loading