Targeted Test Time Adaptation of Memory Networks for Video Object Segmentation

Isidore Dubuisson, Damien Muselet, Christophe Ducottet, Jochen Lang

Published: 01 Jan 2025, Last Modified: 14 Jul 2025VISIGRAPP (3): VISAPP 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Semi Automatic Video Object Segmentation (SVOS) aims to segment few objects in a video based on the annotation of these particular objects in the first frame only. State-of-the-art methods rely on offline training on a large dataset that may lack specific samples and details directly applicable to the current test video. Common solutions are to use test-time adaptation to finetune the offline model with the single annotated frame or by relying on complex semi-supervised strategies. In this paper, we introduce targeted test-time adaptation of memory-based SVOS providing the benefits of finetuning with much smaller learning effort. Our method targets specific parts of the model to ensure improved results while maintaining robustness of the offline training. We find that targeting the bottleneck features and the masks that are saved in memory provide substantial benefits. The evaluation of our method shows a significant improvement for video segmentation on DAVIS16 and DAVIS17 datasets.