Keywords: benchmarking, NLP datasets, evaluation methodologies
TL;DR: We present NUBench, a benchmark that evaluates LLMs’ sentence-level understanding of negation by contrasting standard negation with carefully designed alternatives.
Abstract: Negation is a fundamental linguistic phenomenon that presents ongoing challenges for Large Language Models (LLMs), especially in tasks that require a deep understanding of semantics. Existing benchmarks often treat negation as a minor aspect within broader tasks, such as natural language inference. As a result, there is a lack of benchmarks specifically designed to evaluate negation comprehension. In this work, we introduce **NUBench**—a novel benchmark explicitly created to assess sentence-level understanding of negation in LLMs. NUBench goes beyond simply detecting surface-level cues by contrasting standard negation with structurally diverse alternatives, such as local negation, contradiction, and paraphrase. This benchmark includes manually curated sentence-negation pairs and a multiple-choice dataset, enabling a thorough evaluation of models' understanding of negation.
Supplementary Material: zip
Primary Area: datasets and benchmarks
Submission Number: 17103
Loading