Abstract: As Large Language Models (LLMs) become increasingly accessible, their potential to be exploited for generating manipulative content poses a threat to society. This study investigates LLMs' ability to produce propaganda when prompted. Using two domain-specific models, we systematically evaluate the generated content. The first model classifies content as propaganda or non-propaganda by detecting underlying patterns in the text. The second model detects specific rhetorical techniques of propaganda at the fragment level. Our findings show that LLMs can not only generate propaganda that closely resembles human-written propaganda but also use a variety of similar rhetorical techniques. Furthermore, we explore mitigation strategies such as Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and ORPO (Odds Ratio Preference Optimization) on the propaganda generation capabilities. We find that fine-tuning significantly reduces LLMs' tendency to generate such content, with ORPO proving to be the most effective method.
Paper Type: Long
Research Area: Computational Social Science and Cultural Analytics
Research Area Keywords: propaganda, misinformation detection, quantitative analyses of llm-generated news, LLMs
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
Submission Number: 4636
Loading