More observation leads to more clarity: Multi-view collaboration network for camouflaged object detection
Abstract: Currently, Camouflaged Object Detection (COD) methods often rely on single-view feature perception, which struggles to fully capture camouflaged objects due to environmental interference such as background clutter, lighting variations, and viewpoint changes. To address this, we propose the multi-view collaboration network (MCNet), inspired by human visual strategies for complex scene analysis. MCNet incorporates multiple perspectives for enhanced feature extraction. The global perception module takes the original, far, and near views, using different large-kernel convolutions and multi-head attention mechanisms for global feature embedding. In parallel, the local perception module processes the tilted, projected, and color-jittered views, extracting fine-grained local features through multi-branch deep convolutions and dilated convolutions. To facilitate deep interaction between global and local features, we introduce the hybrid interactive module, which explores the correlation of multi-view feature information and adaptively fuses features. For feature decoding, the dynamic pyramid shrinkage module integrates dynamic gated convolutions with a pyramid shrinkage mechanism, progressively aggregating semantic features through a hierarchical shrinking strategy and group fusion strategy. Experimental results on popular COD benchmark datasets show that MCNet outperforms 18 state-of-the-art methods.
Loading