Omni-SafetyBench: A Benchmark for Safety Evaluation of Audio-Visual Large Language Models

Omni-SafetyBench: A Benchmark for Safety Evaluation of Audio-Visual Large Language Models

ACL ARR 2026 January Submission2336 Authors

02 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: omni-modal large language models, safety, evaluation

Abstract: Omni-modal Large Language Models (OLLMs) that integrate visual, auditory, and textual processing face severe safety risks. They exhibit fragile defenses against audio-visual joint harmful inputs and demonstrate inconsistent safety performance across different modalities, enabling simple modality-switching jailbreaks. However, existing safety benchmarks fail to comprehensively assess these risks due to the absence of audio-visual joint samples, limited modality coverage, and lack of parallel test cases for cross-modal consistency evaluation. To address these gaps, we introduce Omni-SafetyBench, the first comprehensive parallel benchmark for OLLM safety evaluation, featuring 23,328 test instances across 24 modality variations derived from 972 seed samples. Recognizing that complex inputs pose comprehension challenges and that cross-modal consistency is critical for OLLM safety, we propose tailored metrics: a Safety-score based on Conditional Attack Success Rate (C-ASR) and Conditional Refusal Rate (C-RR), and a Cross-Modal Safety Consistency score (CMSC-score). Evaluating 10 state-of-the-art OLLMs reveals severe vulnerabilities: only 3 models exceed 0.6 in both metrics, with safety degrading sharply for audio-visual inputs. Furthermore, evaluation of existing safety alignment methods on Omni-SafetyBench identifies fundamental challenges in OLLM safety alignment, highlighting urgent needs for enhanced research in this domain. Code and data are available in anonymous repositories.

Paper Type: Long

Research Area: Safety and Alignment in LLMs

Research Area Keywords: safety and alignment, benchmarking

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data resources, Data analysis

Languages Studied: English

Submission Number: 2336

Loading