Beyond Raw Competence: Logical Equivariance in Diffusion Language Models

Published: 26 May 2026, Last Modified: 26 May 2026ICML 2026 FoGen Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Diffusion Language Models, Logical Equivariance, Presentation Bias, Structured Reasoning, Positional Encodings, Logical Trajectory Supervision
TL;DR: Diffusion language models lack logical equivariance and suffer from presentation bias due to 1D positional encodings; achieving robust reasoning requires combining trajectory supervision with explicit multi-view geometric exposure.
Abstract: Diffusion language models (dLLMs) replace strict left-to-right generation with iterative denoising, theoretically bypassing the causal serialization bottleneck. However, do they truly solve the underlying logical state, or do they merely memorize favorable 1D presentations? We argue that genuine reasoning requires logical equivariance (LE), defined as consistency across symmetric transformations of an input. Introducing Sudoku-AC-Orbit, an exact correctness-preserving benchmark, we expose a severe ``orbit gap'': models solve canonical 1D layouts but fail catastrophically when the exact same logical constraints are geometrically transformed. We theoretically trace this fragility to a positional obstruction, proving via the Row-major Toeplitz--$D_4$ Rigidity Theorem that standard 1D positional encodings inherently clash with 2D grid symmetries and leak presentation information. Finally, while Logical Trajectory Supervision (LTS) massively boosts raw solver competence, achieving presentation-stable reasoning requires explicit multi-view exposure (LTS+$D_4$). Ultimately, scaling raw competence does not automatically yield the logical equivariance necessary for robust machine intelligence.
Submission Number: 221
Loading