Abstract: Online memes are a powerful yet challenging medium for content moderation, often masking harmful intent behind humor, irony, or cultural symbolism. Conventional moderation systems “especially those relying on explicit text” frequently fail to recognize such subtle or implicit harm. We introduce MemeSense, an adaptive framework designed to generate socially grounded interventions for harmful memes by combining visual and textual understanding with curated, semantically aligned examples enriched with commonsense cues. This enables the model to detect nuanced complexed threats like misogyny, stereotyping, or vulgarity “even in memes lacking overt language”. Across multiple benchmark datasets, MemeSense outperforms state-of-the-art methods, achieving up to 35% higher semantic similarity
and 9% improvement in BERTScore for non-textual memes, and notable gains for text-rich memes as well. These results highlight MemeSense as a promising step toward safer, more context-aware AI systems for real-world content moderation. The code and data are available at: https://github.com/sayantan11995/MemeSense
Submission Type: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Here we provide the minor revisions made from the original submission as suggested by the AE.
**It doesn’t test across a wide range of model sizes or families..**: We explicitly mentioned that our framework is using mid-sized language models (8B-9B parameters) in post-hoc (zero-shot/in-context) settings, without additional fine-tuning in the **Introduction** Section. We also mentioned about our publicly releasing of our dataset in the *contribution* part.
**The authors should improve the writing. Some details on metrics are missing...**: We provided an additional paragraph **Training and Test Split** in the Section 5 to reduce ambiguity in the training and test dataset selection. In the Section 6.1 we provided the detailed experimental setup for *MemeGuard* method to reduce inconsistency. Also we added the calculation detail of measuring *Readability Score* in the Section 6.3.
**The MemeSense framework proposed in this paper demonstrates some innovation in identifying multimodal harmful content, but it still suffers from deficiencies in cross-cultural generalization, failure case analysis..**: We added the large scale manual evaluation and cross-cultural generalization of our method in the Section 8. We added a *Comprehensive failure case analysis* in the Appendix E
Code: https://github.com/sayantan11995/MemeSense
Assigned Action Editor: ~Gunhee_Kim1
Submission Number: 5318
Loading