Logical Non-Commutativity in Large Language Models: Premise Order Should Not Matter, But It Does

Poojak Patel

Logical Non-Commutativity in Large Language Models: Premise Order Should Not Matter, But It Does

Poojak Patel

21 Mar 2026 (modified: 06 Apr 2026)ICLR 2026 Workshop LLM Reasoning Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Track: tiny / short paper (up to 4 pages)

Keywords: logical reasoning, large language models, permutation invariance, premise order, order sensitivity, benchmark, commutativity, chain-of-thought

TL;DR: LLMs violate a basic property of logic- their conclusions change when premise order changes- with invariance near random chance for inconsistent worlds.

Abstract: Logical reasoning is, by definition, invariant to the order in which premises are presented: the entailment relation between a set of premises and a conclusion depends only on the content of the premises, not their sequence. We evaluate whether large language models (LLMs) satisfy this property. Using a controlled synthetic benchmark of 144 logical worlds with exact symbolic gold labels, we generate up to four premise-order permutations per world and query a frontier LLM under direct and chain-of-thought (CoT) prompting. We introduce Permutation Invariance Rate (PIR)---the fraction of permutation pairs on which a model produces identical outputs---as a diagnostic metric. We find that PIR is 0.49 under direct prompting and 0.57 under CoT: LLMs agree with themselves across premise orderings only about half the time. PIR is lowest for logically inconsistent worlds (0.32), suggesting that inconsistency detection is especially order-sensitive. We argue that permutation invariance is a missing evaluation axis in LLM reasoning benchmarks and that the observed logical non-commutativity reflects a fundamental limitation of sequence-based reasoning.

Presenter: ~Poojak_Patel1

Format: Yes, the presenting author will attend in person if this work is accepted to the workshop.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Submission Number: 191

Loading