Multimodal Hate Speech Detection via Cross-Domain Knowledge TransferDownload PDFOpen Website

2022 (modified: 22 Nov 2022)ACM Multimedia 2022Readers: Everyone
Abstract: Nowadays, the hate speech diffusion of texts and images in social network has become the mainstream compared with the diffusion of texts-only, raising the pressing needs of multimodal hate speech detection task. Current research on this task mainly focuses on the construction of multimodal models without considering the influence of the unbalanced and widely distributed samples for various attacks in hate speech. In this situation, introducing enhanced knowledge is necessary for understanding the attack category of hate speech comprehensively. Due to the high correlation between hate speech detection and sarcasm detection tasks, this paper makes an initial attempt of common knowledge transfer based on the above two tasks, where hate speech detection and sarcasm detection are defined as primary and auxiliary tasks, respectively. A scalable cross-domain knowledge transfer (CDKT) framework is proposed, where the mainstream vision-language transformer could be employed as backbone flexibly. Three modules are included, bridging the semantic, definition and domain gaps simultaneously between primary and auxiliary tasks. Specifically, semantic adaptation module formulates the irrelevant parts between image and text in primary and auxiliary tasks, and disentangles with the text representation to align the visual and word tokens. Definition adaptation module assigns different weights to the training samples of auxiliary task by measuring the correlation between samples of the auxiliary and primary task. Domain adaptation module minimizes the feature distribution gap of samples in two tasks. Extensive experiments show that the proposed CDKT provides a stable improvement compared with baselines and produces a competitive performance compared with some existing multimodal hate speech detection methods.
0 Replies

Loading