Abstract: Multimodal knowledge graphs, which integrate visual and textual data, have been widely utilized in various applications. Despite their potential, they often suffer from incompleteness due to the large amount of undiscovered valuable triple knowledge within the graph. This has led to a surge in research on multimodal knowledge graph completion. However, existing methods often face challenges such as irrelevant noise in multimodal data and limitations of straightforward multimodal fusion, which can lead to suboptimal model performance. In this paper, we propose a novel adaptive multimodal graph learning approach for efficient knowledge graph completion. We first introduce an adaptive multimodal knowledge capture module designed to integrate entity-related multimodal knowledge. This module includes a relation-aware modal view construction process to achieve semantic meaning consistency, along with link-aware point-to-face interactions for coarse-grained multimodal capture and adaptive point-to-point interactions for fine-grained multimodal extraction. We then propose a two-stage multimodal graph fusion module, which includes a cross-modal augmentation module to perform first-stage multimodal fusion between the entity graph and visual/textual graph, as well as a dynamic selection fusion module to conduct the second-stage fusion between entity-visual and entity-textual graphs. Our method demonstrates superior effectiveness through empirical evaluations on three common datasets.
External IDs:dblp:journals/datamine/PengSZW25
Loading