MRE-MI: A Multi-image Dataset for Multimodal Relation Extraction in Social Media Posts

Shizhou Huang, Bo Xu, Changqun Li, Yang Yu, Xin Alex Lin

Published: 01 Jan 2025, Last Modified: 19 May 2025NAACL (Findings) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Despite recent advances in Multimodal Relation Extraction (MRE), existing datasets and approaches primarily focus on single-image scenarios, overlooking the prevalent real-world cases where relationships are expressed through multiple images alongside text. To address this limitation, we present MRE-MI, a novel human-annotated dataset that includes both multi-image and single-image instances for relation extraction. Beyond dataset creation, we establish comprehensive baselines and propose a simple model named Global and Local Relevance-Modulated Attention Model (GLRA) to address the new challenges in multi-image scenarios. Our extensive experiments reveal that incorporating multiple images substantially improves relation extraction in multi-image scenarios. Furthermore, GLRA achieves state-of-the-art results on MRE-MI, demonstrating its effectiveness. The datasets and source code can be found at https://github.com/JinFish/MRE-MI.