NUBench: A Benchmark for LLMs' Sentence-Level Negation Understanding

Yeonkyoung So; Gyuseong Lee; Sungmok Jung; Joonhak Lee; Sangho Kim; JiA Kang; Jaejin Lee

NUBench: A Benchmark for LLMs' Sentence-Level Negation Understanding

Yeonkyoung So, Gyuseong Lee, Sungmok Jung, Joonhak Lee, Sangho Kim, JiA Kang, Jaejin Lee

19 Sept 2025 (modified: 26 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: benchmarking, NLP datasets, evaluation methodologies

TL;DR: We present NUBench, a benchmark that evaluates LLMs’ sentence-level understanding of negation by contrasting standard negation with carefully designed alternatives.

Abstract: Negation is a fundamental linguistic phenomenon that presents ongoing challenges for Large Language Models (LLMs), especially in tasks that require a deep understanding of semantics. Existing benchmarks often treat negation as a minor aspect within broader tasks, such as natural language inference. As a result, there is a lack of benchmarks specifically designed to evaluate negation comprehension. In this work, we introduce **NUBench**—a novel benchmark explicitly created to assess sentence-level understanding of negation in LLMs. NUBench goes beyond simply detecting surface-level cues by contrasting standard negation with structurally diverse alternatives, such as local negation, contradiction, and paraphrase. This benchmark includes manually curated sentence-negation pairs and a multiple-choice dataset, enabling a thorough evaluation of models' understanding of negation.

Supplementary Material: zip

Primary Area: datasets and benchmarks

Submission Number: 17103

Loading