A Study on the Reasoning Ability of LLMs Using Negation Principles in Chinese Sentences

Ran Li; Lingxiang Fan; Gan Ze; Zhaoyang Gao; Lifei Wang; Renjia Xiao; Zhe Yu; Guiyun Zhao; Zhe Lin

A Study on the Reasoning Ability of LLMs Using Negation Principles in Chinese Sentences

Ran Li, Lingxiang Fan, Gan Ze, Zhaoyang Gao, Lifei Wang, Renjia Xiao, Zhe Yu, Guiyun Zhao, Zhe Lin

Published: 15 Nov 2025, Last Modified: 08 Mar 2026AAAI 2026 Bridge LMReasoning OralEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models (LLMs), Negation-Based Inference, Non-Classical Reasoning, Non-Classical Logic, Logical Negation Properties, Robustness Evaluation, Logical Reasoning Ability Evaluation

Abstract: The reasoning abilities of large language models (LLMs) have attracted growing interest in both AI and Logic. In this paper, we examine the extent to which fourteen LLMs (eleven Chinese LLMs and three English LLMs) can understand the logical properties of negation in inferential contexts in Chinese. We focus on fifteen well-studied properties of negation in classical propositional logic, such as the inference: “If it rains, then the ground gets wet. Therefore: If the ground is not wet, then it did not rain.” These principles are fundamental to deductive reasoning and serve as building blocks for various non-classical logics. The study of different forms of negation has long been central to philosophical logic, and assessing how LLMs handle negation-based inferences provides a more fine-grained evaluation of their reasoning capabilities, as well as a comparison with human logical competence. Overall, the LLMs we tested perform reasonably well on negation reasoning in Chinese, but they show a clear resistance to certain negation properties: the models often fail to accept these rules and perform poorly on inferences that rely on them. Analysis by specific negation patterns reveals substantial differences: some forms of negation consistently pose greater difficulty, and models sometimes produce answers that directly contradict logical expectations. Zero-shot chain-of-thought prompting can improve consistency to some extent, but the degree of improvement varies across models and properties and is not consistently large. Under robustness tests, even high-performing models can experience notable declines. In addition, we observe that some models are more prone to adopt non-classical negation patterns, such as Intuitionistic negation or minimal negation rather than classical negation in certain cases.

Submission Number: 39

Loading