Commitment-Aware Axiomatic Coherence: Measuring Non-Vacuous Consistency in LMM Logical Reasoning
Track: long paper (up to 10 pages)
Keywords: logical reasoning evaluation, commitment-aware coherence, negation-consistency violation, abstention and coverage, FOLIO benchmark
TL;DR: Coherence checks can look good by abstention; adding a commitment metric alongside negation-violation reveals an abstention–contradiction frontier on FOLIO v0.0 (204 ex.).
Abstract: Large language models (LLMs) are increasingly used for logical tasks, yet they
frequently exhibit contradictions across closely related queries. A natural response
is to measure logical coherence by checking axioms such as negation consistency.
However, we show that coherence can be vacuous: a model can appear consistent by
refusing to commit to either a statement or its negation. We propose commitmentaware
axiomatic coherence, a lightweight evaluation protocol that complements a
standard negation-coherence check with a commitment score measuring how much
probability mass the model assigns to entailed vs. refuted outcomes (as opposed to
abstention/uncertainty). Using a deterministic log-probability elicitation procedure
(YES/NO) and a simple 3-way decision rule (True/False/Uncertain), we evaluate
four open LLMs on the public FOLIO v0.0 validation split. Results reveal a clear
frontier: some models achieve low contradiction rates primarily by abstaining (low
coverage), while others achieve high coverage at the cost of pervasive negationcoherence
violations. Our findings argue that reliable logical reasoning evaluation
requires reporting both coherence and non-vacuous commitment, not coherence
alone.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 140
Loading