Keywords: Image content moderation, safety, large vision-language model, potential violation element
Abstract: To manage the sheer volume of online image content, robust content moderation systems are essential, driving the development of specialized datasets and methods. However, current visual content moderation datasets are limited by pre-defined, fixed safety policies, restricting their applicability for evaluating and fine-tuning large vision-language models (LVLMs) under various real-world safety policies.
To address this gap, we introduce PVE-100, the first fine-grained, element-level dataset for visual content moderation covering 22k manually annotated samples and over 100 Potential Violation Elements (PVEs) spanning multiple dimensions.
With element-level annotations, PVE-100 offers flexibility to evaluating and fine-tuning models for customized safety policies.
Moreover, our experiments demonstrate that these fine-grained annotations can also be simply yet effectively used to further enhance open-source LVLMs via a PVE perception objective during fine-tuning, and to augment closed-source models through a plug-and-play PVE perception expert.
Code and dataset will be publicly available upon acceptance.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 3638
Loading