LogicPhase: Reasoning Phase Transitions in Large Language Models Under Controlled Contradiction Density

21 Mar 2026 (modified: 06 Apr 2026)ICLR 2026 Workshop LLM Reasoning Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Track: long paper (up to 10 pages)
Keywords: logical reasoning, large language models, benchmark, contradiction detection, phase transition, overcommitment, consistency, prompt robustness
TL;DR: LLMs exhibit a phase-transition-like reasoning collapse as contradiction density increases, becoming more confidently wrong rather than more uncertain.
Abstract: While large language models (LLMs) have achieved impressive performance on standard logical reasoning benchmarks, these benchmarks share a structural blind spot: all problem instances are internally consistent. We argue this design choice masks a critical failure mode---reasoning collapse under inconsistency. We introduce LogicPhase, a synthetic benchmark that systematically varies contradiction exposure level ρ∈[0,0.4]\rho \in [0, 0.4] ρ∈[0,0.4], the fraction of problem instances seeded with inconsistency-inducing logical structure, across 300 instances with exact symbolic gold labels. We evaluate a frontier instruction-tuned LLM under three prompting regimes---direct, chain-of-thought (CoT), and abstain-aware---and define four metrics: Entailment Accuracy (EA), Inconsistency Detection Rate (IDR), Overcommitment Rate (OR), and Paraphrase Stability (PS). Our central finding is that accuracy degrades non-linearly with ρ\rho ρ, exhibiting a phase-transition-like collapse near ρ≈0.2\rho \approx 0.2 ρ≈0.2, accompanied by a sharp rise in overcommitment: confident true/false predictions when the correct response is Inconsistent or Unknown. Abstain-aware prompting partially mitigates collapse but does not eliminate it. We connect this behavior to the statistical mechanics of satisfiability phase transitions and argue that contradiction-density evaluation should become a standard axis in logic benchmark design.
Presenter: ~Poojak_Patel1
Format: Yes, the presenting author will attend in person if this work is accepted to the workshop.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 190
Loading