Hateful Memes Classification using Machine Learning

Jafar Badour

Published: 04 Dec 2021, Last Modified: 05 May 20262021 IEEE Symposium Series on Computational Intelligence (SSCI)EveryoneCC BY 4.0

Abstract: Several studies produced sophisticated models for sentiment analysis of textual data, and many others tackled feature extraction from images. However, far fewer studies focus on the multimodal representation of data, namely the information that consists of multiple channels. In this work, we focus on the classification problem of multimodal data. Memes comprise a visual image and a textual caption. This work is dedicated to classifying hateful memes and this work proposes two approaches to solve the multimodal classification problem. First, converting the visual channel into a textual one and feed it to textual classifiers. The other approach, which yielded superior results, converted both channels into a vector representation and then combined them to represent the visual-textual context. This work is a consequence of the Facebook Hateful Memes challenge. The model developed in this work managed to rank 32 among 3172 competitors in the challenge. The model is implemented with no domain knowledge or understanding of hate speech. This model performed well in the Facebook Hateful Memes challenge dataset and a novel dataset that we created to prove the consistency of generic models over other models that are structured according to domain knowledge. In contrast to the top solution in the Facebook Memes Challenge, this work provides a generic approach, without hard-coding rules ahead of training or validation, that is able to learn the hatefulness definition from any dataset. A novel dataset that comprises hateful memes retrieved randomly from the web is described in this work, which is used as another dataset to test approaches generality.