Track: long paper (up to 10 pages)
Keywords: LLM reasoning, logical consistency, multi-query contradictions, constraint solving, neuro-symbolic methods, satisfiability, benchmark construction, coherence metrics
Abstract: Large language models frequently produce mutually inconsistent answers when reasoning
over multiple related queries. We study case-file logical consistency: maintaining a
globally satisfiable belief state across interdependent queries. We introduce a
benchmark of 390 multi-query reasoning instances with entailment/contradiction/unknown
labels, and propose set-level metrics including Case Satisfiability Rate, Contradiction
Density and Revision Cost. Our solver-augmented approach extracts commitments, verifies
global satisfiability and performs counterexample-guided repair. Across four reasoning
domains, our method substantially reduces cross-query contradictions (SetCons: 0.56 to 0.94)
while preserving per-query accuracy demonstrating that global coherence is critical for
robust multi-query reasoning.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Funding: Yes, the presenting author of this submission falls under ICLR’s funding aims, and funding would significantly impact their ability to attend the workshop in person.
Submission Number: 5
Loading