The Spurious Composition Problem: Conditional Independence as a Necessary and Sufficient Condition for Systematic Generalization
Keywords: Systematic generalization, compositional generalization, conditional independence, information theory, spurious composition, neural networks
TL;DR: We prove that systematic generalization fails due to "spurious compositions" and introduce CCI, a metric and training method that identifies and removes these context leaks to achieve state-of-the-art performance[cite: 1].
Abstract: Despite rapid progress in deep learning, neural networks routinely fail to generalise compositionally: a model that handles individual concepts during training cannot reliably combine them in novel ways at test time. We provide the first tight, two-way information-theoretic characterisation of this failure. We introduce Conditional Compositional Independence (CCI), which measures how much the learned representation of one compositional component leaks information about its co-occurring partners. Our main result is that $\epsilon$-systematic generalisation is equivalent to $\epsilon$-CCI, up to Lipschitz constants: (i) $CCI = 0$ is necessary for zero composition gap (Theorem 3.1); and (ii) $\epsilon$-CCI implies an $O(\sqrt{\epsilon})$ composition gap (Theorem 3.2). We further prove that standard ERM provably violates CCI on any realistic compositional training distribution (Theorem 3.3), and that CCI-constrained learning requires $|\mathcal{P}|$-times fewer samples than unconstrained ERM (Theorem 3.4), where $|\mathcal{P}|$ is the primitive vocabulary size. Building on this theory, we (i) characterise which architectures can satisfy CCI (Theorem 3.5); (ii) provide an efficient, differentiable estimator (Theorem A.4); and (iii) derive \textsc{CCI-Train}, a regularised training algorithm. Experiments on SCAN, COGS, CLUTRR, and our new controlled COMPSYM benchmark confirm all theoretical predictions and show that \textsc{CCI-Train} improves systematic generalisation accuracy by +30--46 percentage points over the strongest baseline.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 58
Loading