Abstract: Large language models (LLMs) have demonstrated remarkable capabilities in many fields, especially in complex neural lagnuage processing tasks. Despite their impressive performance, the content generated by LLMs still suffers from the problem of hallucination, particularly in tasks that require real-time data or specialized domain knowledge. Knowledge graphs and multimodal unstructured data serve as important sources of knowledge that can help address the hallucination issues in LLMs. However, existing methods mostly utilize knowledge graphs or multimodal unstructured data in isolation, neglecting the interaction between the two and it is the interaction that contributes to the extraction of deep knowledge in the knowledge base. In this paper, we propose a novel framework called the Collaborative Framework of Multimodal Unstructured Data and Knowledge Graph (CoMuS-KG). This framework enhances the reasoning capabilities of LLMs by enabling interaction between multimodal unstructured data and knowledge graphs, extracting deep knowledge from unstructured data, and completing missing information in knowledge graphs. Specifically, CoMuS-KG first decompose the question posed to the LLMs into multiple sub-questions and convert these sub-questions into knowledge graph triplets with missing head entity, tail entity, or relation. And then the knowledge graph and multimodal unstructured data are used to complete these triplets. Finally, we use the completed triplets to answer the original question and the completed triplets can be updated back into the knowledge graph to assist in other reasoning tasks. Extensive experiments on three KGQA benchmark datasets demonstrate the question-answering performance and reasoning capabilities of CoMuS-KG. Our code is publicly available at: https://github.comlGuChongAnlCoMuS-KG
External IDs:dblp:conf/cscwd/HuWXGWS25
Loading