Ignore All Previous Instructions: Jailbreaking as a de-escalatory peace building practise to resist LLM social media bots

Published: 01 Mar 2026, Last Modified: 25 Mar 2026AI4PeaceEveryoneRevisionsCC BY 4.0
Track: tiny / short paper (up to 4 pages)
Keywords: social media, LLM, jailbreak, peace building, de-escalatory
TL;DR: LLM powered social media bots enable state actors to escalate tension and manipulate political discourse. But user led jailbreaking can serve as a de-escalatory form of peace building to resist these social media bots
Abstract: The capacity of Large Language Models (LLMs) to generate large quantities of text at speed has intensified the scale and strategic manipulation of political discourse on social media, contributing to the propagation of conflict escalation narratives. Existing literature largely focuses on platform-led moderation as a countermeasure, yet, platform-level approaches have been found to face significant challenges in combatting misinformation. In this paper, we propose a user-centric view of ``jailbreaking" as an emergent, non-violent de-escalation practice. Jailbreaking in this setting involves online users engaging with suspected LLM-powered accounts to circumvent LLM safeguards, exposing automated behaviour and disrupting the circulation of misleading narratives. Jailbreaking supports user-led efforts to unveil inauthentic accounts and support peace building endeavours.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 6
Loading