A Weakly-Supervised Cross-Domain Query Framework for Video Camouflage Object Detection

Zelin Lu, Liang Xie, Xing Zhao, Binwei Xu, Haoran Liang, Ronghua Liang

Published: 2025, Last Modified: 17 Apr 2025IEEE Trans. Circuits Syst. Video Technol. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: VCOD (Video Camouflage Object Detection) is a crucial security technology that identifies camouflaged objects in videos, bolstering security measures across diverse applications. On one hand, appearance-based VCOD methods face challenges because camouflaged appearances cause objects to blend into their surroundings, and current VCOD methods typically utilize optical flow to represent motion information. However, over-reliance on accurate estimation renders the model overly fragile. On the other hand, there is a shortage of effectively annotated camouflaged video datasets, coupled with the time-consuming and labor-intensive annotation process, severely constraining the development of this field. To address this, we propose a novel weakly-supervised framework for VCOD based on cross-domain querying of preceding and succeeding frames. Specifically, we propose a time-efficient and labor-saving manual annotation approach based on large visual models to rapidly generate pseudo-labels. Furthermore, we design a network based on Spatio-Temporal Memory (STM) that performs cross-modal feature querying with the current frame against preceding and succeeding frames to acquire useful information, thereby enhancing the focus on temporal information. Extensive experiments conducted on two common VCOD datasets have proven the effectiveness of our method, achieving state-of-the-art performance on the challenging camouflaged video data.