Abstract: Facebook, Twitter, Instagram, and other social media sites allow anonymity and independence. People exert their right to free expression without fear of repercussions. However, in the absence of thorough surveillance, people have fallen prey to offensiveness, trolls, and social media predators. Memes, a type of multimodal media, are becoming increasingly popular online. While most memes are meant to be humorous, some use dark humor to disseminate offensive content. Our present research focuses on learning the dependency and correlation between the three tasks, viz., detecting offensive memes, classifying offensive memes into fine-grained categories, and detecting emotions in a meme. For this, we created EmoffMeme, a large-scale multimodal dataset for Hindi. We aim at gaining insight into hidden social media users’ emotions by studying the meme’s text and image. We present an end-to-end multitasking deep neural network-based CLIP (Contrastive Language-Image Pre-training) model to solve the above correlated tasks simultaneously. We also employ Multimodal Factorized Bilinear (MFB) pooling to incorporate one common portrayal of a meme’s textual and visual part. We demonstrated the effectiveness of our work through extensive experiments. The evaluation shows that the proposed multitask framework yields better performance for the primary task, i.e., offensiveness identification, with the help of secondary task, i.e., emotion analysis.
Loading