Commitment-Aware Axiomatic Coherence: Measuring Non-Vacuous Consistency in LMM Logical Reasoning

Published: 05 Mar 2026, Last Modified: 25 Apr 2026ICLR 2026 Workshop LLM ReasoningEveryoneRevisionsBibTeXCC BY 4.0
Track: long paper (up to 10 pages)
Keywords: logical reasoning evaluation, commitment-aware coherence, negation-consistency violation, abstention and coverage, FOLIO benchmark
TL;DR: Coherence checks can look good by abstention; adding a commitment metric alongside negation-violation reveals an abstention–contradiction frontier on FOLIO v0.0 (204 ex.).
Abstract: Large language models (LLMs) are increasingly used for logical tasks, yet they frequently exhibit contradictions across closely related queries. A natural response is to measure logical coherence by checking axioms such as negation consistency. However, we show that coherence can be vacuous: a model can appear consistent by refusing to commit to either a statement or its negation. We propose commitment-aware axiomatic coherence, a lightweight evaluation protocol that complements a standard negation-coherence check with a commitment score measuring how much probability mass the model assigns to entailed vs. refuted outcomes (as opposed to abstention/uncertainty). Using a deterministic log-probability elicitation procedure (YES/NO) and a simple 3-way decision rule (True/False/Uncertain), we evaluate four open LLMs on the public FOLIO v0.0 validation split. Results reveal a clear frontier: some models achieve low contradiction rates primarily by abstaining (low coverage), while others achieve high coverage at the cost of pervasive negation-coherence violations. Our findings argue that reliable logical reasoning evaluation requires reporting both coherence and non-vacuous commitment, not coherence alone.The project is available at https://meherabb.github.io/Commitment/
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Funding: Yes, the presenting author of this submission falls under ICLR’s funding aims, and funding would significantly impact their ability to attend the workshop in person.
Submission Number: 140
Loading