PLURULE: A Challenging Benchmark for Detecting Rule Violations of Pluralistic Communities on Social Media

PLURULE: A Challenging Benchmark for Detecting Rule Violations of Pluralistic Communities on Social Media

ACL ARR 2026 January Submission7691 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Social Media, Content Moderation, Vision-Language Models, Pluralism, Benchmark, Online Communities, Reddit, Norm Violation, Multimodal Learning, Computational Social Science, Trust and Safety, Online Governance, Community Norms, Social Computing, Platform Safety, Automated Moderation, Rule Violation Detection, Context-Aware Analysis, Norm Identification, Multiple-Choice QA, Multimodal Reasoning, Cross-Lingual Transfer, Vision-Language Models (VLMs), Large Language Models (LLMs), Safety Alignment, Zero-Shot Evaluation, Robustness, Context-Dependent Moderation, Semantic Clustering, Global vs. Local Norms, Moderator Labor, Community-Specific Rules, Digital Constitutionalism, Algorithmic Bias, Cultural Sensitivity, Scalable Moderation

Abstract: Social media are shifting towards pluralism — community-governed platforms where groups define their own norms. What violates rules in one community may be perfectly acceptable in another. Can AI models help detect rule violations of such pluralistic communities? We formalize the task as a multiple-choice problem, mirroring how human moderators operate in the real world: given a comment and its surrounding context, identify which specific rule, if any, is violated. We introduce PLURULE, a multimodal, multilingual benchmark for detecting 17,313 rule violations across 2,419 Reddit communities spanning 3,692 pluralistic rules in 10 languages. Using this benchmark, we show that state-of-the-art vision-language models struggle significantly: even GPT-5.2 with high reasoning performs only slightly better than a trivial baseline. We also find that bigger models and increased context provide marginal gains, and universal rules like civility and self-promotion are easier to detect. Our results show that pluralistic moderation of social media is a fundamental challenge for language models.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: benchmarking, NLP datasets, evaluation methodologies, multilingual benchmarks, multimodality, hate-speech detection, safety and alignment, values and culture, policy and governance

Contribution Types: NLP engineering experiment, Data resources, Data analysis

Languages Studied: English, French, Portuguese, German, Spanish, Polish, Dutch, Greek, Italian, Ukrainian.

Submission Number: 7691

Loading