Simple Yet Effective: Structure Guided Pre-trained Transformer for Multi-modal Knowledge Graph Reasoning
Abstract: Various information in different modalities in an intuitive way in multi-modal knowledge graphs (MKGs), which are utilized in different downstream tasks, like recommendation. However, most MKGs are still far from complete, which motivates the flourishing of MKG reasoning models. Recently, with the development of general artificial intelligence, pre-trained transformers have drawn increasing attention, especially in multi-modal scenarios. However, the research of multi-modal pre-trained transformers (MPT) for knowledge graph reasoning (KGR) is still at an early stage. As the biggest difference between MKG and other multi-modal data, the rich structural information underlying the MKG is still not fully utilized in previous MPT. Most of them only use the graph structure as a retrieval map for matching images and texts connected with the same entity, which hinders their reasoning performances. To this end, the graph Structure Guided Multi-modal Pre-trained Transformer is proposed for knowledge graph reasoning (SGMPT). Specifically, the graph structure encoder is adopted for structural feature encoding. Then, a structure-guided fusion module with two simple yet effective strategies, i.e., weighted summation and alignment constraint, is designed to inject the structural information into both the textual and visual features. To the best of our knowledge, SGMPT is the first MPT for multi-modal KGR, which mines structural information underlying MKGs. Extensive experiments on FB15k-237-IMG and WN18-IMG, demonstrate that our SGMPT outperforms existing state-of-the-art models, and proves the effectiveness of the designed strategies.
Primary Subject Area: [Content] Multimodal Fusion
Secondary Subject Area: [Content] Multimodal Fusion
Relevance To Conference: Our work makes better use of the graph structure of KG to align the information from different modalities. Specifically, the structure-guided fusion module contains two different strategies, i.e., weighted summation and alignment constraint, which can be adopted individually and composedly. As a plug-and-play mechanism, our method can be simply integrated with other KG reasoning models, and achieve better performance.
Submission Number: 2615
Loading