Commitment-Aware Axiomatic Coherence: Measuring Non-Vacuous Consistency in LMM Logical Reasoning

Published: 05 Mar 2026, Last Modified: 08 Mar 2026ICLR 2026 Workshop LLM ReasoningEveryoneRevisionsBibTeXCC BY 4.0
Track: long paper (up to 10 pages)
Keywords: logical reasoning evaluation, commitment-aware coherence, negation-consistency violation, abstention and coverage, FOLIO benchmark
TL;DR: Coherence checks can look good by abstention; adding a commitment metric alongside negation-violation reveals an abstention–contradiction frontier on FOLIO v0.0 (204 ex.).
Abstract: Large language models (LLMs) are increasingly used for logical tasks, yet they frequently exhibit contradictions across closely related queries. A natural response is to measure logical coherence by checking axioms such as negation consistency. However, we show that coherence can be vacuous: a model can appear consistent by refusing to commit to either a statement or its negation. We propose commitmentaware axiomatic coherence, a lightweight evaluation protocol that complements a standard negation-coherence check with a commitment score measuring how much probability mass the model assigns to entailed vs. refuted outcomes (as opposed to abstention/uncertainty). Using a deterministic log-probability elicitation procedure (YES/NO) and a simple 3-way decision rule (True/False/Uncertain), we evaluate four open LLMs on the public FOLIO v0.0 validation split. Results reveal a clear frontier: some models achieve low contradiction rates primarily by abstaining (low coverage), while others achieve high coverage at the cost of pervasive negationcoherence violations. Our findings argue that reliable logical reasoning evaluation requires reporting both coherence and non-vacuous commitment, not coherence alone.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 140
Loading