Keywords: evaluation methodologies, reasoning, logical reasoning, calibration/uncertainty
TL;DR: We propose a new framework based on Markov logic network to evaluate LLM's reasoning over inconsistent knowledge and release accompanying datasets.
Abstract: Most natural language reasoning tasks in the research community assume consistent input knowledge. Nevertheless, real-world scenarios often involve inconsistent information, which might lead to divergent conclusions and are typically associated with varying levels of uncertainty. This raises a key research question: can large language models (LLMs) effectively handle uncertainty in their reasoning process to maximize knowledge consistency?
In this paper, we propose a framework for evaluating reasoning over inconsistent knowledge. Our approach models uncertainty via weights of logical rules, leveraging Markov logic networks (MLN), which integrate probabilistic reasoning with first-order logic. This enables us to quantify inconsistencies in knowledge bases, and hence rigorously evaluate LLM reasoning. We introduce two tasks using this framework: 1) QA, which involves answering questions by integrating inconsistent knowledge; and 2) knowledge rectification, where we aim to rectify language models' acquired knowledge to improve consistency. We curate a dataset of 3,000 MLN-formatted knowledge bases to implement these tasks. We evaluate state-of-the-art LLMs on these tasks and highlight their limitations in uncertainty-aware reasoning over inconsistent logical knowledge.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html
Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html
Submission Number: 1505
Loading