When Does Disentanglement Enable Compositional Generalization? A Transfer Bound and Its Empirical Validation
Keywords: compositional generalization, disentangled representations, few-shot learning, sequence-to-sequence models, mutual information, transfer bound, out-of-distribution generalization, SCAN, COGS, representation learning
TL;DR: A transfer bound makes disentanglement a measurable predictor of compositional generalization, validated on SCAN and COGS.
Abstract: Neural sequence models reliably learn the primitives of a compositional grammar but fail to recombine those primitives into structures they were not explicitly trained on. We argue this is a representational failure, not a coverage one: when the encoder hidden state entangles which primitive is present with which relation governs it, the decoder exploits spurious co-occurrences that hold during training but break on novel compositions. We formalise this with a transfer bound in which the mutual information between primitive and relational subspaces of the hidden state appears as an additive penalty on target error. We then operationalise the bound with DisentangledLSTM, which combines a hard factorisation of the encoder state with auxiliary classification supervision, and Scaffolded Inference, a grammar-guided test-time decomposition. On the SCAN jump split, our model attains 99.2% accuracy at four support examples versus 38.3% for a matched bidirectional-LSTM baseline; across four splits, Mutual Information Gap correlates with transfer accuracy at r≈0.9 (n=4). Preliminary results on COGS show the same qualitative gap on a naturalistic benchmark. Code and configurations will be released anonymously upon acceptance.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 65
Loading