WMAJL: Watcher-Mediated Attention Joint Learning Model for Multimodal Relation Extraction

Published: 2025, Last Modified: 23 Jan 2026ICASSP 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In the domain of Multimodal Relation Extraction (MRE), we present the $\color{Red}{\text{W}}$atcher-$\color{Red}{\text{M}}$ediated $\color{Red}{\text{A}}$ttention $\color{Red}{\text{J}}$oint $\color{Red}{\text{L}}$earning Model ($\color{Red}{\text{WMAJL}}$), a novel approach addressing the challenges of modality alignment noise, cross-modal fusion disparity, preservation of textual relative position information, and the distinctiveness of classification labels. WMAJL employs an integrative framework leveraging contrastive learning and variational autoencoder constraints to mitigate modality alignment noise by prioritizing relevant semantic data and effectively reducing extraneous noise that does not contribute to the task. The model’s innovative architecture includes a mediator watcher, which facilitates enhanced cross-modal fusion by enabling nuanced information exchange between textual and visual modalities while preserving the unique characteristics of each modality. Additionally, the design of auxiliary tasks, such as Named Entity Recognition (NER), and output supervision constructs loss functions that preserve relative position information, ensuring a precise depiction of entity relationships throughout the multilayer encoding processes. A key differentiator of WMAJL is its label-centric self-information loss technique, inspired by InfoNCE, which trains the model to cluster similar relation labels in semantically coherent areas, thereby optimizing classification label uniqueness by discerning subtle differences among relation types. The synergistic application of these strategies has led to a significant enhancement of WMAJL’s performance, as evidenced by its state-of-the-art F1 score of $\color{Red}{84.93\%}$ on the MNRE dataset. This achievement surpasses existing benchmarks and sets a new standard for multimodal knowledge extraction, underscoring WMAJL’s potential to revolutionize the MRE landscape.
Loading