Cross-modal multi-relational graph reasoning: A novel model for multimodal textbook comprehension

Published: 01 Jan 2025, Last Modified: 14 May 2025Inf. Fusion 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•Cross-modal graphs are dynamically constructed to capture image-text interactions.•Adapt graph structure to specific tasks via label-driven reasoning.•Dynamically learn and adjust its internal representation based on specific tasks.•Cutting-edge performance on cross-modal textbook reasoning tasks.
Loading