A multimodal-multitask framework with cross-modal relation and hierarchical interactive attention for semantic comprehension

Mohammad Zia Ur Rehman; Devraj Raghuvanshi; Umang Jain; Shubhi Bansal; Nagendra Kumar

A multimodal-multitask framework with cross-modal relation and hierarchical interactive attention for semantic comprehension

Mohammad Zia Ur Rehman, Devraj Raghuvanshi, Umang Jain, Shubhi Bansal, Nagendra Kumar

Published: 01 Jan 2026, Last Modified: 19 Sept 2025Inf. Fusion 2026EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•The proposed work provides a multimodal–multitask method for hate content detection.•Proposed work is also able to detect sarcasm, motivation, humor, and sentiment.•A novel cross-modal relation graph method is proposed for feature reconstruction.•Proposed Hierarchical Interactive Monomodal Attention (HIMA) benefits multitasking.•Extensive experiments are performed on three hateful memes datasets.

Loading