Abstract: In this paper, we address the challenge of categorizing hateful content on social media through the analysis of online screenshots. Such screenshots may contain only text, only images with a caption, or images with embedded text. OCR-based techniques may help only in the first case, while for the other two cases it would be necessary to leverage visual language models to classify the type of content and its source. We leverage various techniques both from OCR’d text and large or visual language models to classify the type of content and its source. The results show that the task is a difficult one, although our experiments shed some light on the possible effective solutions for this task.
Loading