Utilizing Language Models For Synthetic Knowledge Graph Generation
Keywords: Synthetic data, large language model, knowledge graph
Abstract: Knowledge Graphs play a pivotal role in various machine-learning tasks. However, constructing these datasets is challenging due to their semantic and structural complexity, often resulting in limited data size. Synthetic graph generation has been applied to augment graph datasets and has proven beneficial in domains such as social network analysis and recommendation systems. Despite this, generating graphs with extensive textual attributes remains underexplored. Large language models (LLMs) possess the capability to generate text and reason about complex data structures, including graphs. In this paper, we leverage the generative and reasoning abilities of LLMs to propose a novel framework for synthetic knowledge graph generation. Our framework integrates two transformers and a text data augmentation module, where prompt and fine-tuning approaches are used to generate sentences and Mahalanobis distance is applied to measure outliers. This framework offers straightforward application and high flexibility, which can effectively generate graph datasets that have a similar triple distribution with the real one. We combine the generated data with real data by either concatenation or mixture way and through extensive experiments on downstream tasks, we demonstrate the effectiveness and versatility of our approach.
Submission Number: 37
Loading