Beyond Embedding Fusion: LLM-Driven Structural Enhancement for Multimodal Knowledge Graph Completion
Keywords: Knowledge graph, Multimodal, Large Language Model
Abstract: Multimodal knowledge graph completion (MKGC) aims to improve structural reasoning by incorporating visual and textual information.
However, existing approaches rely heavily on embedding fusion, where multimodal features are compressed and fused into a unified vector before structural prediction. This compression-then-fusion paradigm inevitably reduces the rich semantics carried by raw modalities, treating them as auxiliary cues rather than sources of explicit structural knowledge. As a result, current MKGC methods often fail to capture deeper relational semantics implied in texts and images. To address this limitation, we propose LLM-SE (Large Language Model–driven Structural Enhancement), a generate-then-disentangle framework that transforms raw multimodal signals into explicit structural triplets instead of collapsing them into unified embeddings. LLM-SE includes two main modules: (1) Multimodal Triplet Generation, which performs the generation step by leveraging large multimodal models to extract meaningful triplets from texts and images; and (2) Dual-View Complex module, a disentanglement mechanism that separates origin triplets from LLM-generated deep triplets, enabling the model to adaptively capture stable and exploratory knowledge. Extensive experiments on multiple MKGC benchmarks show that LLM-SE consistently outperforms state-of-the-art models across all metrics.
Paper Type: Long
Research Area: Machine Learning for NLP
Research Area Keywords: Generation, Information Extraction, Machine Learning for NLP
Languages Studied: English
Submission Number: 1607
Loading