Listen to Both Sides and be Enlightened! -- Hierarchical Modality Fusion Network for Entity and Relation ExtractionDownload PDF

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: Multimodal named entity recognition and relation extraction (MNER and MRE) is a fundamental and crucial branch in multimodal learning. However, existing approaches for MNER and MRE mainly suffer from 1) error sensitivity when images contain irrelevant concepts not mentioned in texts; and 2) large modality gap between image and text features, especially hierarchical visual features. To deal with these issues, we propose a novel Hierarchical Modality fusion NeTwork (HMNeT) for visual-enhanced entity and relation extraction, aim to reduce the modality gap and achieve more effective and robust performance. Specifically, we innovatively leverage hierarchical pyramidal visual features to conduct multi-layer internal integration in Transformer. We further present a dynamic gated aggregation strategy to decide modality integration according to different images. Extensive experiments on three benchmark datasets demonstrate the effectiveness of our method, and achieve state-of-the-art performance.
0 Replies

Loading