Quantifying Contextual Hallucinations in NLP Research Papers Before and After the LLM Era
Keywords: Hallucination, LLM, Scientific Writing, Context Inconsistency, Faithfulness, NLP, Academic Integrity
TL;DR: This study investigates contextual inconsistencies (a type of hallucination) in NLP research papers before and after the rise of LLMs.
Abstract: The emergence of Large Language Models (LLMs) has raised growing attention to their potential implications
for academic writing. Although LLMs demonstrate impressive generative capabilities, they are also known to
produce inaccurate yet confidently presented content, a phenomenon known as hallucination. When such hallucinated content appears in scientific writing, it can undermine clarity, introduce context inconsistencies, and
potentially compromise the integrity of the research narrative. In this study, we focus on detecting one specific
type of hallucination, namely “context inconsistency", a form of faithfulness hallucination. We investigate the
prevalence of contextual contradictions in NLP research papers across two distinct periods: before and after the
widespread availability of LLMs, treating NLP as a case study. The paper explores the evolution of inconsistencies, identifies potential LLM-induced discrepancies in the post-LLM era, and evaluates their severity at the
paragraph level. We conducted an in-depth analysis of the frequency and intensity of contextual hallucination
and observed an increase in inconsistent research articles over time, especially in 2023 and 2024. Interestingly,
while inconsistency rates rise overall, the share attributed to AI-generated text has declined, hinting that such
content is becoming harder to distinguish from human writing.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 95
Loading