A multimodal-multitask framework with cross-modal relation and hierarchical interactive attention for semantic comprehension

Published: 01 Jan 2026, Last Modified: 19 Sept 2025Inf. Fusion 2026EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•The proposed work provides a multimodal–multitask method for hate content detection.•Proposed work is also able to detect sarcasm, motivation, humor, and sentiment.•A novel cross-modal relation graph method is proposed for feature reconstruction.•Proposed Hierarchical Interactive Monomodal Attention (HIMA) benefits multitasking.•Extensive experiments are performed on three hateful memes datasets.
Loading