VALUE-Bench: A Comprehensive Benchmark for Evaluating Large Vision-Language Models on Multimodal Ethical Understanding

ACL ARR 2024 June Submission3467 Authors

16 Jun 2024 (modified: 06 Jul 2024)ACL ARR 2024 June SubmissionEveryone, Ethics ReviewersRevisionsBibTeXCC BY 4.0
Abstract: Multimodal ethical understanding refers to morally analyzing and discerning ethical scenarios described in both visual and natural language contexts. While various aspects of large vision-language models (LVLMs) have been evaluated, their capacity for multimodal ethical understanding remains unclear to the public. In this paper, we propose VALUE-Bench, a comprehensive benchmark that rigorously evaluates the multimodal ethical understanding ability of LVLMs. Instead of focusing on the surface descriptions of images and language, the VALUE-Bench is progressively and comprehensively evaluated on four dimensions: ethical understanding, robustness, reliability, and resistance to misuse. We collect 6 datasets and 10 multimodal ethical understanding tasks in real-world multimodal ethical scenarios (e.g., harmful, hateful, offensive, humiliating, violent, misogynistic, stereotyping, objectifying, etc.). Moreover, we provide an in-depth analysis of the multimodal ethical understanding of existing English and Chinese LVLMs. VALUE-Bench is very helpful to enhance the evaluation of LVLMs' multimodal ethical understanding by providing a nuanced view of their ethical understanding level and ethical decision-making ability in both English and Chinese contexts.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: benchmarking; evaluation methodologies; evaluation;
Languages Studied: English, Chinese
Submission Number: 3467
Loading