A Text-Enhanced Transformer Fusion Network for Multimodal Knowledge Graph Completion

Jingchao Wang, Xiao Liu, Weimin Li, Fangfang Liu, Xing Wu, Qun Jin

Published: 01 Jan 2024, Last Modified: 21 Jul 2025IEEE Intell. Syst. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Multimodal knowledge graphs (MKGs) organize multimodal facts in the form of entities and relations, and have been successfully applied to several downstream tasks. As most MKGs are incomplete, the MKG completion task has been proposed to address this problem, which aims to complete missing entities in MKGs. Most of the previous works obtain reasoning ability by capturing the correlation between target triplets and related images, but they ignore contextual semantic information and the reasoning process is not easily explainable. To address these issues, we propose a novel text-enhanced transformer fusion network, which converts the context path between head and tail entities into natural language text and fuses multimodal features from both coarse and fine granularities through a multigranularity fuser. It not only effectively enhances text semantic information but also improves the interpretability of the model by introducing paths. Experimental results on benchmark datasets demonstrate the effectiveness of our model.