Hybrid End-to-End Knowledge Graph Construction and Validation: A Cross-Domain Study with LLM-as-a-Judge

Hybrid End-to-End Knowledge Graph Construction and Validation: A Cross-Domain Study with LLM-as-a-Judge

15 Sept 2025 (modified: 08 Oct 2025)Submitted to Agents4ScienceEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Knowledge Graph, End-to-End Construction, Large Language Models, LLM-as-a-Judge

Abstract: The automated construction of knowledge graphs (KGs) from unstructured text remains a central challenge in information management and artificial intelligence. This paper introduces a hybrid framework that combines the conceptual reasoning of large language models (LLMs) with the efficiency of scalable, rule-based methods to deliver an end-to-end pipeline for KG construction and validation. The framework begins with ontology induction using an LLM to define domain-specific entity and relation types, followed by large-scale rule-based information extraction, entity resolution, and graph assembly. A novel extrinsic evaluation method, \emph{LLM-as-a-Judge}, is employed to assess the semantic quality of the resulting graphs. We evaluate the pipeline across three diverse benchmarks. In the financial domain, the FiQA dataset (5{,}500+ documents) yielded a graph with 475 nodes and 36 edges, achieving an overall quality score of 2.97/5 at a total cost of 2.63. In the document-level relation extraction setting, the DocRED dataset (100 annotated documents) produced 5{,}000 nodes and 389 edges, with a lower quality score of 2.68/5, primarily due to systematic entity type misclassification. In the biomedical domain, the CDR dataset (100 sampled abstracts) generated 966 nodes and 13 edges, but achieved the highest semantic precision, with an overall quality of 3.91/5 at a cost of 0.65. Across all datasets, the pipeline demonstrated efficiency, with end-to-end processing times under one hour, and highlighted complementary strengths and weaknesses: FiQA emphasized scale but sparse connectivity, DocRED revealed classification challenges, and CDR achieved high entity-level precision despite graph fragmentation. These results validate the effectiveness of hybrid architectures for KG construction: LLMs provide strong conceptual modeling, while rule-based systems ensure scalability and cost-efficiency. The \emph{LLM-as-a-Judge} framework further supplies actionable feedback, exposing domain-specific error modes and guiding refinement. Our work establishes a cost-effective, modular, and adaptable methodology for automated KG construction, offering a foundation for future research on improving connectivity, refining extraction accuracy, and extending to new domains.

Submission Number: 183

Loading