Keywords: LVLM Evaluation, Safety, Agent
Abstract: Large vision-language models (LVLMs) exhibit remarkable capabilities in vision and language tasks but face significant safety challenges, which undermine their reliability in real-world applications. Efforts have been made to build LVLM safety evaluation benchmarks to uncover their vulnerability. However, existing benchmarks are hindered by their labor-intensive construction process and static complexity that fail to keep pace with rapidly evolving model architectures and emerging risks. To address these limitations, we propose VLSafetyBencher, the first multi-agent system designed to automated LVLM safety benchmarking, which introduces four collaborative agents: preprocessing, cross-modal processing, augmentation, and post-sampling agents. With the optimized sampling algorithm at the sample level, VLSafetyBencher can be conveniently applied to benchmark construction, update, and sample evaluation. We conduct experiments on benchmark construction and updating tasks with VLSafetyBencher and evaluate the extensive LVLMs. Our results demonstrate that the automatically generated dataset effectively distinguishes model safety, with a safety rate disparity of nearly 70% between the most and least safe models. Ablation analyses further validate VLSafetyBencher's effectiveness.
Supplementary Material: pdf
Primary Area: datasets and benchmarks
Submission Number: 242
Loading