ROME: Towards Robust Metrics of Factual Consistency with Sentence-Level Contrastive Alignment and Chain-of-Thought
Abstract: While the capabilities of language models have been extensively discussed, they remain prone to hallucinations and factual inconsistencies. Specifically, despite the burgeoning interest in the application of pre-trained language models for automatic evaluation metrics, we find that these widely used models struggle with longer texts and are susceptible to various adversarial attacks. In response, we propose a sentence-level evaluation method that reflects the factuality consistency between input and output, and introduce ROME. Further, we propose a Fact Chain-of-Thought (FactCoT) to elicit LLMs to construct a robust meta-evaluation benchmark encompassing various types of errors and approximately 50k factuality-consistent datasets based on six human-annotated datasets. Integrating three contrastive objectives to bolster model robustness against adversaries, ROME is a sentence-level model that can be expanded to handle long inputs and detect outputs with factual inconsistencies. When applied to address the issue of factual inconsistencies in text summarization tasks, ROME's performance significantly surpasses existing models. It further demonstrates its generalizability to unseen tasks.
Paper Type: long
Research Area: Summarization
0 Replies
Loading