EXPLEME: A Study in Meme Interpretability, Diving Beyond Input Attribution

23 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: Meme, Offensiveness, Interpretability, Language Modeling
TL;DR: We develop a novel and theoretically grounded interpretability technique beyond standard input attributions for meme offensiveness detection.
Abstract: Memes, originally created for humor and social commentary, have evolved into vehicles for offensive and harmful content online. Detecting such content is crucial for upholding the integrity of digital spaces. However, binary classification of memes as offensive or not often falls short in practical applications. Ensuring the reliability of these classifiers and addressing inadvertent biases during training are essential tasks. While numerous input-attribution based interpretability methods exist to shed light on the model's decision-making process, they frequently yield insufficient and semantically irrelevant keywords extracted from input memes. In response, we propose a novel, theoretically grounded approach that extracts meaningful ``tokens" from a global vocabulary, yielding both relevant and exhaustive set of interpretable keywords. This method provides valuable insights into the model's behavior and uncovers hidden meanings within memes, significantly enhancing transparency and fostering user trust. Through comprehensive quantitative and qualitative evaluations, we demonstrate the superior effectiveness of our approach compared to conventional baselines. Our research contributes to a deeper understanding of meme content analysis and the development of more robust and interpretable multimodal systems.
Primary Area: societal considerations including fairness, safety, privacy
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7795
Loading