Abstract: Multimodal knowledge graph embedding refers to learning multimodal entities and their relation representations in a low-dimensional space. However, existing multimodal embedding models tend to ignore the inherent structure of knowledge graphs. To address this issue, we propose a novel multimodal knowledge graph embedding model to simultaneously learn semantic relation and hierarchical structure of entities within a hyperbolic space. Specifically, we project all modalities features embedding into a hyperbolic space and unify these embeddings to form a multimodal embedding. Then, we model the knowledge graph triplets by treating the relation as a Lorentzian linear transformation from head entity to tail entity. The plausibility of triplets is measured by Lorentz distance. Extensive experiments on multimodal knowledge graph completion benchmarks validate that our model achieves the state-of-the-art results across most metrics. In terms of training speed, our model is one order of magnitude faster than the best one. The visualization results further reveal our model’s ability to capture hierarchical structures. Our code is available at https://github.com/llqy123/HyME.
External IDs:dblp:conf/icassp/LiangWWBY25
Loading