Propaganda Generation by Large Language Models: Empirical Evidence and Mitigation Strategies

Propaganda Generation by Large Language Models: Empirical Evidence and Mitigation Strategies

ACL ARR 2025 February Submission4636 Authors

15 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: As Large Language Models (LLMs) become increasingly accessible, their potential to be exploited for generating manipulative content poses a threat to society. This study investigates LLMs' ability to produce propaganda when prompted. Using two domain-specific models, we systematically evaluate the generated content. The first model classifies content as propaganda or non-propaganda by detecting underlying patterns in the text. The second model detects specific rhetorical techniques of propaganda at the fragment level. Our findings show that LLMs can not only generate propaganda that closely resembles human-written propaganda but also use a variety of similar rhetorical techniques. Furthermore, we explore mitigation strategies such as Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and ORPO (Odds Ratio Preference Optimization) on the propaganda generation capabilities. We find that fine-tuning significantly reduces LLMs' tendency to generate such content, with ORPO proving to be the most effective method.

Paper Type: Long

Research Area: Computational Social Science and Cultural Analytics

Research Area Keywords: propaganda, misinformation detection, quantitative analyses of llm-generated news, LLMs

Contribution Types: Model analysis & interpretability, NLP engineering experiment

Languages Studied: English

Submission Number: 4636

Loading