USB: A COMPREHENSIVE AND UNIFIED SAFETY EVALUATION BENCHMARK FOR MULTIMODAL LARGE LANGUAGE MODELS

USB: A COMPREHENSIVE AND UNIFIED SAFETY EVALUATION BENCHMARK FOR MULTIMODAL LARGE LANGUAGE MODELS

ACL ARR 2026 January Submission9406 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: benchmarking, safety and alignment, multimodality

Abstract: Despite their rapid advancement, Multimodal Large Language Models (MLLMs) remain vulnerable to diverse safety risks. Current benchmarks fail to provide reliable assessments due to limited risk coverage, insufficient scale, and the oversight of complex modality combinations (e.g., cross-modal risks). To address this, we introduce the Unified Safety Benchmark (USB), a comprehensive framework covering 61 risk categories across four distinct modality interactions. We first demonstrate that existing benchmarks—even when aggregated—leave significant coverage gaps. To bridge this, we design a sophisticated data synthesis pipeline that generates extensive complementary data, ensuring balanced coverage across all risk dimensions. Crucially, beyond evaluating vulnerability to harmful queries, USB pioneers the simultaneous assessment of model over-refusal on benign inputs. Extensive experimental results, conducted across 12 mainstream open-source MLLMs and 5 closed-source commercial MLLMs, demonstrates that existing MLLMs still struggle with the trade-off between avoiding vulnerabilities and over-refusal, and are more vulnerable to image-only risky or cross-modal risky inputs, highlighting the need for refined safety mechanisms. Warning: This paper contains unfiltered and potentially harmful content that may be offensive.

Paper Type: Long

Research Area: Safety and Alignment in LLMs

Research Area Keywords: benchmarking, safety and alignment, multimodality

Contribution Types: Model analysis & interpretability, Data resources, Data analysis

Languages Studied: English

Submission Number: 9406

Loading