Spatio-Temporal Pixel-Level Contrastive Learning-based Source-Free Domain Adaptation for Video Semantic Segmentation

Published: 2023, Last Modified: 07 Nov 2024CVPR 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Unsupervised Domain Adaptation (UDA) of semantic segmentation transfers labeled source knowledge to an un-labeled target domain by relying on accessing both the source and target data. However, the access to source data is often restricted or infeasible in real-world scenar-ios. Under the source data restrictive circumstances, UDA is less practical. To address this, recent works have ex-plored solutions under the Source-Free Domain Adaptation (SFDA) setup, which aims to adapt a source-trained model to the target domain without accessing source data. Still, existing SFDA approaches use only image-level information for adaptation, making them sub-optimal in video applications. This paper studies SFDA for Video Semantic Segmentation (VSS), where temporal information is lever-aged to address video adaptation. Specifically, we propose Spatio-Temporal Pixel-Level (STPL) contrastive learning, a novel method that takes full advantage of spatio-temporal information to tackle the absence of source data better. STPL explicitly learns semantic correlations among pixels in the spatio-temporal space, providing strong self-supervision for adaptation to the unlabeled target domain. Extensive experiments show that STPL achieves state-of-the-art performance on VSS benchmarks compared to current UDA and SFDA approaches. Code is available at: https://github.com/shaoyuanlo/STPL
Loading