Keywords: Knowledge Graphs, Natural Language Generation, Semantic Fidelity, Synthetic Data Generation, Large Language Models, Self-Supervised Learning, KG-to-Text, Information Extraction, Round-Trip Consistency, Industrial AI
TL;DR: The paper introduces a self-correcting pipeline to ensure Large Language Models (LLMs) accurately turn structured data (Knowledge Graphs) into text without hallucinating or losing facts.
Abstract: Knowledge Graphs (KGs) are the backbone of reliable industrial data strategies, yet verbalizing them with Large Language Models (LLMs) often leads to unacceptable risks for high-stakes applications, such as hallucinations or omitted relations. To enforce strict semantic fidelity in KG-to-text generation, we introduce a self-supervised round-trip pipeline. The system verbalizes KG triples into text and immediately attempts to reconstruct the original graph from that text; only verbalizations that enable perfect graph recovery are retained. This creates a closed feedback loop that guarantees the generated text is semantically equivalent to the source data. Experiments confirm that our automated round-trip consistency score correlates strongly with expert judgment, effectively acting as a scalable proxy for human review. Furthermore, we show that standard LLMs can bootstrap their own KG-extraction and generation capabilities by fine-tuning on this trusted synthetic data. Our approach yields significant improvements in triple-extraction accuracy and verbalization faithfulness without relying on costly manual annotation or massive teacher models, offering a practical path to deploying trustworthy, KG-grounded AI systems.
Submission Type: Emerging
Copyright Form: pdf
Submission Number: 496
Loading