When Does Composition Compose? A PAC-Theoretic Framework for Compositional Faithfulness, Safety Certificates, and Training Dynamics
Keywords: Keywords PAC Learning, Compositional Generalization, OOD Safety, Sample Complexity, Mechanistic Interpretability, Representation Learning.
TL;DR: A PAC-theoretic framework providing certifiable safety bounds and identifying a "compositional collapse" phase transition in neural network generalization
Abstract: Compositional generalization---the ability to understand and produce novel combinations of familiar concepts---is widely regarded as a cornerstone of robust intelligence, yet modern neural networks routinely fail at it. Despite extensive empirical study, there is no general-purpose, architecture-agnostic, finite-sample theory that simultaneously (i) characterizes when a learned representation will compose faithfully out-of-distribution, (ii) yields a certifiable safety guarantee computable from primitive-level data alone, and (iii) predicts the training dynamics of compositional learning. We fill this gap. We introduce Compositional Faithfulness ($\epsilon, \delta$-CF), a graded, measurable property of a representation that quantifies how accurately the model's latent operator mirrors the true compositional structure of the task. We prove five results: a PAC sample-complexity separation showing that CF models require exponentially fewer training examples than unconstrained models (Theorem 3.1); a generalization bound whose three decoupled terms correspond to representation quality, reliability, and statistical estimation (Theorem 3.2); a phase-transition theorem for training dynamics that identifies a critical regularization strength $\lambda^*$ below which representations undergo compositional collapse (Theorem 3.4); a certifiable safety radius computable without observing any novel compositions (Theorem 3.5); and an identifiability result establishing that the latent composition operator is unique up to change of basis, opening a mechanistic interpretability agenda (Theorem 3.6). We validate all five results across SCAN, COGS, CFQ, multi-hop QA (HotpotQA), and a large-scale LLM audit of LLaMA-3-8B and Mistral-7B. Our framework yields practical tools: a pre-deployment safety audit that predicts OOD failure from primitive-level data, and an interpretability probe for the latent composition structure of arbitrary encoders.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 60
Loading