Beyond Entities: A Large-Scale Multi-Modal Knowledge Graph with Triplet Fact Grounding

Published: 01 Jan 2024, Last Modified: 21 Jul 2025AAAI 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Much effort has been devoted to building multi-modal knowledge graphs by visualizing entities on images, but ignoring the multi-modal information of the relation between entities. Hence, in this paper, we aim to construct a new large-scale multi-modal knowledge graph with triplet facts grounded on images that reflect not only entities but also their relations. To achieve this purpose, we propose a novel pipeline method, including triplet fact filtering, image retrieving, entity-based image filtering, relation-based image filtering, and image clustering. In this way, a multi-modal knowledge graph named ImgFact is constructed, which contains 247,732 triplet facts and 3,730,805 images. In experiments, the manual and automatic evaluations prove the reliable quality of our ImgFact. We further use the obtained images to enhance model performance on two tasks. In particular, the model optimized by our ImgFact achieves an impressive 8.38% and 9.87% improvement over the solutions enhanced by an existing multi-modal knowledge graph and VisualChatGPT on F1 of relation classification. We release ImgFact and its instructions at https://github.com/kleinercubs/ImgFact.
Loading