SymTex: A New Benchmark for Non-monotonic Reasoning Capability of Large Language Models

ICLR 2025 Conference Submission2501 Authors

22 Sept 2024 (modified: 27 Nov 2024)ICLR 2025 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Non-monotonic Reasoning, Large Language Models
Abstract: Non-monotonic reasoning (NMR) plays a crucial role in logical reasoning, allowing inference to adjust as new information arises. This adaptability is key for large language models (LLMs) to handle complex problems and adjust reasoning in dynamic environments, mimicking human-like flexibility in thought. Recent works mainly explore using LLMs to address non-monotonic reasoning through textual logic representation, as LLMs excel in understanding natural language. However, textual logic representation often leads to ambiguity and complexity, especially in complex situations, while symbolic logic representation is more clear and precise, avoiding these issues. In this work, we introduce a framework called Multi-step Generation for Symbolic and Textual NMR Samples (MG-SymTex) to generate diverse non-monotonic samples automatically, and build a non-monotonic reasoning benchmark, called SymTex, which is used to evaluate the non-monotonic reasoning capability of LLMs. SymTex comprises two types of description and three types of predicate, facilitating two primary tasks: Tri-State Boolean Querying and Answer Set Computation. Through our comprehensive evaluations, we demonstrate that state-of-the-art LLMs such as gpt-4o, claude-3.5-sonnet, and o1-mini encounter significant challenges when addressing our proposed benchmark, highlighting the difficulty of non-monotonic reasoning in LLMs.
Supplementary Material: zip
Primary Area: datasets and benchmarks
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2501
Loading