Keywords: Camouflage Scenes, LVLM, Camouflage Quantification, Benchmark
TL;DR: The first benchmark for LVLMs in camouflage scenes understanding
Abstract: Current camouflaged object detection methods predominantly follow discriminative segmentation paradigms and heavily rely on predefined categories present in the training data, limiting their generalization to unseen or emerging camouflage objects. This limitation is further compounded by the labor-intensive and time-consuming nature of collecting camouflage imagery. Although Large Vision-Language Models (LVLMs) show potential to improve such issues with their powerful generative capabilities, their understanding of camouflage scenes is still insufficient. To bridge this gap, we introduce MMCSBench, the first comprehensive multimodal benchmark designed to evaluate and advance LVLM capabilities in camouflage scenes. MMCSBench comprises 22,537 images and 76,843 corresponding image-text pairs across five fine-grained camouflage tasks. Additionally, we propose a new task, Camouflage Efficacy Assessment (CEA), aimed at quantitatively evaluating the camouflage effectiveness of objects in images and enabling automated collection of camouflage images from large-scale databases. Extensive experiments on 26 LVLMs reveal significant shortcomings in models' ability to perceive and interpret camouflage scenes. These findings highlight the fundamental differences between natural and camouflaged visual inputs, offering insights for future research in advancing LVLM capabilities within this challenging domain.
Croissant File: json
Dataset URL: https://kaggle.com/datasets/f2218284b51011e4e27d4c4d8b41eff771d0ec734c535a7a0b94bb2a02058a46
Code URL: https://github.com/zhangjinCV/MMCSBench
Primary Area: Datasets & Benchmarks for applications in language modeling and vision language modeling
Submission Number: 3
Loading