LLM-driven Hateful Meme Detection via Cross-modal Memorizing and Self-rejection Training

22 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Hateful Meme Detection, Multimodality, Self-Rejection Training, Cross-modal Memorizing
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Hateful meme detection (HMD) is critical for determining whether online multimodal content carries harmful information, which plays a pivotal role in maintaining a harmonious internet ecosystem. HMD is predominantly viewed as a multimodal task, where the harmful message in memes is expressed through the information conveyed by the combination of visual and text content (e.g., the contradictions between them) rather than that from one modality. Thus, effective modeling and smooth integration of multimodal information are crucial for achieving promising HMD performance. Current research on HMD conventionally models visual and text data independently, subsequently aligns and merges these multimodal features for HMD predictions. However, existing studies face challenges in identifying hateful information that derives from the complementarities or contradictions between image and text, where in most cases neither image nor text alone carries explicit hateful information. Moreover, these studies do not leverage the capabilities of large language models (LLMs), which have been demonstrated effective in cross-modal information processing. Therefore in this paper, we propose a multimodal approach for HMD following the encoding-decoding paradigm with using LLM and a memory module enhanced by self-rejection training. Particularly, the memory module learns appropriate relationships between image and text that lead to hateful memes, where the resulted information is fed into the LLM and accompanied with visual and text features to predict HMD labels. Self-rejection training performs a discriminative learning according to memory outputs and enhances the memory module to improve HMD. We evaluate our approach on English and Chinese benchmark datasets, where it outperforms strong baselines, demonstrating the effectiveness of all components in it and our model design.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4738
Loading