Systematic Assessment of Factual Knowledge in Large Language Models

Published: 07 Oct 2023, Last Modified: 13 Mar 2025EMNLP 2023 FindingsEveryoneRevisionsBibTeX
Submission Type: Regular Short Paper
Submission Track: Language Modeling and Analysis of Language Models
Keywords: large language models, hallucination, knowledge graph
Abstract: Previous studies have relied on existing question-answering benchmarks to evaluate the knowledge stored in large language models (LLMs). However, this approach has limitations regarding factual knowledge coverage, as it mostly focuses on generic domains which may overlap with the pretraining data. This paper proposes a framework to systematically assess the factual knowledge of LLMs by leveraging knowledge graphs (KGs). Our framework automatically generates a set of questions and expected answers from the facts stored in a given KG, and then evaluates the accuracy of LLMs in answering these questions. We systematically evaluate the state-of-the-art LLMs with KGs in generic and specific domains. The experiment shows that ChatGPT is consistently the top performer across all domains. We also find that LLMs performance depends on the instruction finetuning, domain and question complexity and is prone to adversarial context.
Submission Number: 5105
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview