PMNet: Predator-Mimicking Network for Video Camouflaged Object Detection

Published: 01 Jan 2025, Last Modified: 25 Jul 2025IEEE Trans. Multim. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The predator has the ability to quickly respond to the misjudged decision and hunt the camouflaged target by analyzing its movement. Those decision compensation and movement analysis for hunting are closely tied to temporal and spatial information. This can be mirrored in the video camouflaged object detection (VCOD) task where the captured temporal information may be misjudged as well as the spatial information tends to be inaccurate in complex scenes. Thus, two key factors should be considered in the VCOD task: How can a model cope with the misjudged temporal information; How can spatial features interact with the temporal information to understand dynamic scenes? To this end, we propose a predator-mimicking network (PMNet) equipped with a selective temporal alignment module (STAM) and a temporal-spatial feedback module (T-SFM). The STAM is designed to alleviate the influence of the misjudged motion trajectory by adopting our adaptive selection mechanism from a novel perspective. In T-SFM, the temporal information works as the self-knowledge to provide assistance and interact with spatial features, enabling the model to effectively detect the camouflaged object. Experimental results demonstrate that our method achieves state-of-the-art performance on VCOD benchmarks. Furthermore, our model can be generalized in the video salient object detection (VSOD) task and also outperforms existing state-of-the-art methods. The source code will be publicly available at https://github.com/LiuTingWed/CriDiff.
Loading