Utilizing Language Models For Synthetic Knowledge Graph Generation

Shuran Fu; Peihua Mai; Zhang Jingqi; Yan Pang

Utilizing Language Models For Synthetic Knowledge Graph Generation

Shuran Fu, Peihua Mai, Zhang Jingqi, Yan Pang

Published: 06 Mar 2025, Last Modified: 30 Apr 2025ICLR 2025 Workshop Data Problems PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Synthetic data, large language model, knowledge graph

Abstract: Knowledge Graphs play a pivotal role in various machine-learning tasks. However, constructing these datasets is challenging due to their semantic and structural complexity, often resulting in limited data size. Synthetic graph generation has been applied to augment graph datasets and has proven beneficial in domains such as social network analysis and recommendation systems. Despite this, generating graphs with extensive textual attributes remains underexplored. Large language models (LLMs) possess the capability to generate text and reason about complex data structures, including graphs. In this paper, we leverage the generative and reasoning abilities of LLMs to propose a novel framework for synthetic knowledge graph generation. Our framework integrates two transformers and a text data augmentation module, where prompt and fine-tuning approaches are used to generate sentences and Mahalanobis distance is applied to measure outliers. This framework offers straightforward application and high flexibility, which can effectively generate graph datasets that have a similar triple distribution with the real one. We combine the generated data with real data by either concatenation or mixture way and through extensive experiments on downstream tasks, we demonstrate the effectiveness and versatility of our approach.

Submission Number: 37

Loading