Assessing the Reliability of LLMs in Faithfully Updating Text

Assessing the Reliability of LLMs in Faithfully Updating Text

ACL ARR 2025 February Submission3406 Authors

15 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: This paper addresses the challenge of faithfully representing updated information in text—a task formalized as the FRUIT problem. Given a source document and a set of evidences detailing updates, the goal is to generate an updated document that integrates new facts while preserving the original coherence and context. We first conduct a comprehensive analysis of the FRUIT dataset, uncovering key structural insights such as the observation that updated articles tend to be approximately 100 tokens longer than their originals, a factor that may bias models toward appending information rather than editing in place. Our study investigates the unsupervised capabilities of LLMs, including zero-shot learning, chain-of-thought reasoning, self-reflection, and evidence ordering, using both the open-source Llama-3-8b and the closed-source GPT-4 models. Our experiments reveal that a zero-shot setup yields the best performance, and that the format of evidences significantly impacts model outcomes, with table-based evidences outperforming unstructured text. These findings have important implications for domains requiring precise document updates, such as software engineering and technical documentation.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: few-shot learning, zero-shot learning, chain-of-thought, faithfull text updation

Contribution Types: Model analysis & interpretability, Reproduction study, Surveys

Languages Studied: English

Submission Number: 3406

Loading