Stochastic Context Consistency Reasoning for Domain Adaptive Object Detection

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Domain Adaptive Object Detection (DAOD) aims to improve the adaptation of the detector for the unlabeled target domain by the labeled source domain. Recent advances leverage a self-training framework to enable a student model to learn the target domain knowledge using pseudo labels generated by a teacher model. Despite great successes, such category-level consistency supervision suffers from poor quality of pseudo labels. To mitigate the problem, we propose a stochastic context consistency reasoning (SOCCER) network with the self-training framework. Firstly, we introduce a stochastic complementary masking module (SCM) to generate complementary masked images thus preventing the network from over-relying on specific visual clues. Secondly, we design an inter-changeable context consistency reasoning module (Inter-CCR), which constructs an inter-context consistency paradigm to capture the texture and contour details in the target domain by aligning the predictions of the student model for complementary masked images. Meanwhile, we develop an intra-changeable context consistency reasoning module (Intra-CCR), which constructs an intra-context consistency paradigm to strengthen the utilization of context relations by utilizing pseudo labels to supervise the predictions of the student model. Experimental results on three DAOD benchmarks demonstrate our method outperforms current state-of-the-art methods by a large margin. Code is released in supplementary materials.
Primary Subject Area: [Content] Media Interpretation
Relevance To Conference: Our work significantly contributes to the multimedia community by addressing the domain shift challenge in object detection, a crucial part of various multimedia applications. In multimodal tasks, traditional object detectors often suffer from limited performance due to domain shifts between labeled source domains and unlabeled target domains, making it challenging to effectively extract visual modal information and meet the demands of multimodal applications. Our proposed Stochastic Context Consistency Reasoning (SOCCER) network tackles this problem by learning context correlation knowledge in the target domain. Through extensive experiments on three benchmarks, we demonstrate that SOCCER outperforms current state-of-the-art methods by a large margin.
Supplementary Material: zip
Submission Number: 841
Loading