MemeGuard: Transformer-Based Fusion for Multimodal Propaganda Detection in Low-Resource Social Media Memes

Published: 02 Dec 2025, Last Modified: 23 Dec 2025MMLoSo 2025 OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Natural language processing, Propaganda detection, Multimodal content, Social media memes, Transformer Fusion
Abstract: Memes are now a common means of communication on social media. Their humor and short format help messages spread quickly and easily. Propagandistic memes use both words and images to influence opinions and behaviors, often appealing to emotions or ideologies. While propaganda detection has been well-studied in high-resource languages (HRLs), there has been a limited focus on low-resource languages (LRLs), such as Bengali. In this study, we introduce MemeGuard, a new dataset of 3,745 memes for detecting propaganda in Bengali. We tested more than 45 different methods, including both single and combined approaches with fusion. For text, BanglaBERT-1 achieved the best macro F1 score of 80.34%, whereas the CLIP vision transformer scored 78.94\% for images. The proposed multimodal model, which combines BanglaBERT-2 and CLIP using Adaptive Modality Fusion, achieved the highest macro F1 score of 85.36%. This work establishes a strong baseline and offers valuable insights for future research in Bengali multimodal content analysis.
Submission Number: 13
Loading