Logical Non-Commutativity in Large Language Models: Premise Order Should Not Matter, But It Does
Track: tiny / short paper (up to 4 pages)
Keywords: logical reasoning, large language models, permutation invariance, premise order, order sensitivity, benchmark, commutativity, chain-of-thought
TL;DR: LLMs violate a basic property of logic- their conclusions change when premise order changes- with invariance near random chance for inconsistent worlds.
Abstract: Logical reasoning is, by definition, invariant to the order in which premises are presented: the entailment relation between a set of premises and a conclusion depends only on the content of the premises, not their sequence. We evaluate whether large language models (LLMs) satisfy this property. Using a controlled synthetic benchmark of 144 logical worlds with exact symbolic gold labels, we generate up to four premise-order permutations per world and query a frontier LLM under direct and chain-of-thought (CoT) prompting. We introduce Permutation Invariance Rate (PIR)---the fraction of permutation pairs on which a model produces identical outputs---as a diagnostic metric. We find that PIR is 0.49 under direct prompting and 0.57 under CoT: LLMs agree with themselves across premise orderings only about half the time. PIR is lowest for logically inconsistent worlds (0.32), suggesting that inconsistency detection is especially order-sensitive. We argue that permutation invariance is a missing evaluation axis in LLM reasoning benchmarks and that the observed logical non-commutativity reflects a fundamental limitation of sequence-based reasoning.
Presenter: ~Poojak_Patel1
Format: Yes, the presenting author will attend in person if this work is accepted to the workshop.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 191
Loading