On the Limitations of LLM-Synthesized Social Media Misinformation Moderation

Sahajpreet Singh; Jiaying Wu; Svetlana Churina; Kokil Jaidka

On the Limitations of LLM-Synthesized Social Media Misinformation Moderation

Sahajpreet Singh, Jiaying Wu, Svetlana Churina, Kokil Jaidka

Published: 05 Mar 2025, Last Modified: 20 Mar 2025ICLR 2025 Workshop ICBINBEveryoneRevisionsBibTeXCC BY 4.0

Track: long paper (up to 4 pages)

Keywords: Misinformation, Content Moderation, Community Notes, LLMs

TL;DR: Despite the impressive capabilities of LLMs, we show that they cannot synthesize misinformation moderation notes that match the quality of human-written ones.

Abstract: Despite significant advances in Large Language Models (LLMs), their effectiveness in social media misinformation moderation -- specifically in generating high-quality moderation texts with accuracy, coherence, and citation reliability comparable to human efforts like Community Notes (CNs) on X -- remains an open question. In this work, we introduce ModBench, a real-world misinformation moderation benchmark consisting of tweets flagged as misleading alongside their corresponding human-written CNs. We evaluate representative open- and closed-source LLMs on ModBench, prompting them to generate CN-style moderation notes with access to human-written CN demonstrations and relevant web-sourced references utilized by CN creators. Our findings reveal persistent and significant flaws in LLM-generated moderation notes, signaling the continued necessity of incorporating trustworthy human-written information to ensure accurate and reliable misinformation moderation.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Submission Number: 44

Loading