Probing the Impacts of Visual Context in Multimodal Entity AlignmentOpen Website

Published: 01 Jan 2022, Last Modified: 08 Jun 2023APWeb/WAIM (2) 2022Readers: Everyone
Abstract: Multimodal entity alignment (MMEA) aims to identify equivalent entities across different multimodal knowledge graphs (KGs), and this topic has drawn increasing attention in recent years. Although the benefits of multimodal information have been observed, its negative impacts are non-negligible as injecting images without constraints brings much noise. It also remains unknown to what extent or under what circumstances visual context is truly helpful to the task. In this work, we employ graph structures and visual context to align entities in different multimodal KGs and propose to selectively combine feature similarities between cross-KG entities of these two aspects when making alignment decision. Specifically, we exploit image classification techniques and entity types to remove potentially un-useful images (visual noises) via generating entity mask vectors in the learning and inference processes. The extensive experiments have validated that the incorporation of selected visual context can substantially improve the MMEA. We also provide a thorough analysis about the impacts of the visual modality and discuss a few cases where injecting entity images induces misalignment.
0 Replies

Loading