Multimodal Knowledge Graph Embeddings via Lorentz-based Contrastive Learning

Ruizhou Liu, Zongsheng Cao, Zhe Wu, Qianqian Xu, Qingming Huang

Published: 01 Jan 2024, Last Modified: 16 May 2025ICME 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Multimodal knowledge graph embeddings (MKGE) have recently garnered significant attention. Unlike traditional unimodal knowledge graph embeddings, MKGE integrates both structural and multimodal knowledge to represent entities within a unified framework. However, real-world entities exhibit heterogeneity, often resulting in semantic inconsistencies where structurally similar embeddings may diverge significantly in their multimodal representations. Previous approaches primarily focus on directly fusing structural and multimodal embeddings, thus overlooking the issue of semantic-embedding inconsistency. To tackle this issue, we propose a new multimodal knowledge graph embedding method via Lorentz-based contrastive learning (LCKGE). we firstly introduce a well-designed nearest-neighbor fusion module via contrastive learning for multimodal fusion. Then, an attention-based Lorentz transformation is proposed for capturing more complex geometric information in MKGs. Furthermore, a series of comprehensive experiments are conducted to demonstrate the effectiveness of our model. We provide the code and appendix of LCKGE in https://github.com/RuizhouLiu/LCKGE