Reference (In-)Determinacy in Natural Language Inference

ACL ARR 2024 June Submission4113 Authors

16 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Natural Language Inference (NLI) provides a general task format for evaluating the semantic relations between two pieces of text, which can be useful for various applications such as fact verification and text attribution. However, existing datasets for NLI and models trained on these datasets make assumptions about the context from which the premise and hypothesis are sampled. In this paper, we revisit this reference determinacy (RD) assumption in NLI, i.e., the premise and hypothesis are assumed to refer to the same context when human raters annotate a label. While RD is a practical assumption for constructing a new NLI dataset, we observe that current NLI models—which are typically trained solely on hypothesis-premise pairs created with the RD assumption—fail in many practical settings in which the premise and hypothesis may refer to different contexts. To highlight the impact of this phenomenon in real-world use cases, we introduce the ReFNLI, a diagnostic benchmark for identifying reference ambiguity in NLI examples. In ReFNLI, the premise is retrieved from a knowledge source (i.e., Wikipedia) and does not necessarily refer to the same context as the hypothesis. With ReFNLI, we demonstrate that finetuned NLI models and few-shot prompted LLMs both fail to recognize context mismatch, leading to > 80% false contradiction and > 50% entailment predictions. We discover that the existence of reference ambiguity in NLI examples can in part explain the inherent human disagreements in NLI, and provide insight into how the RD assumption impacts NLI dataset creation process.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: natural language inference,Textual Entailment,data influence
Contribution Types: Model analysis & interpretability, Data resources, Data analysis
Languages Studied: English
Submission Number: 4113
Loading