MultiLogicNMR(er): A Benchmark and Neural-Symbolic Framework for Non-monotonic Reasoning Tasks with Multiple Extensions

MultiLogicNMR(er): A Benchmark and Neural-Symbolic Framework for Non-monotonic Reasoning Tasks with Multiple Extensions

ACL ARR 2024 June Submission4325 Authors

16 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Non-monotonic reasoning is a classic paradigm widely used in daily life and legal reasoning. The $\delta$-$NLI$ and LogicNMR proposed in the existing work have only preliminary explored the non-monotonic reasoning ability of the pre-trained language models (LMs) in natural language. However, the performance of large language models (LLMs) on complex non-monotonic reasoning tasks with multiple extensions has not yet been explored. An extension can be interpreted as a set of plausible conclusions. In this paper, we automatically synthesized a non-monotonic reasoning dataset with multiple extensions, MultiLogicNMR. Then, we systematically evaluated prompt-based and fine-tuned LLMs using skeptical and credulous reasoning, respectively. Skeptical reasoning only believes in common facts in all extensions, while credulous reasoning believes in facts in any one extension. In addition, inspired by classic symbolic solvers, we propose a neural-symbolic framework, MultiLogicNMRer, to improve the model's non-monotonic reasoning ability. Experimental results show that the accuracy of MultiLogicNMRer based on ChatGPT3.5 is about 23.1\% higher $(46.2\% \rightarrow 69.3\%)$ than the corresponding prompt-based LLMs. The proposed MultiLogicNMR dataset and MultiLogicNMRer framework are expected to promote the research of LLMs on non-monotonic reasoning in natural language.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: corpus creation, benchmarking, evaluation

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data resources, Data analysis

Languages Studied: English

Submission Number: 4325

Loading