Representational Homomorphism Error Predicts Compositional Generalization In Language Models

Zhiyu An; Wan Du

Representational Homomorphism Error Predicts Compositional Generalization In Language Models

Zhiyu An, Wan Du

Published: 23 Sept 2025, Last Modified: 29 Oct 2025NeurReps 2025 ProceedingsEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Compositionality, Out-Of-Distribution Generalization, Language Models

Abstract: Compositional generalization—the ability to understand novel combinations of familiar components—remains a significant challenge for neural networks despite their success in many language tasks. Current evaluation methods focus on behavioral measures that reveal \emph{when} models fail to generalize compositionally, but provide limited insight into \emph{why} these failures occur at the representational level. We introduce \textit{Homomorphism Error} (HE), a structural metric that quantifies how well neural network representations preserve compositional operations by measuring deviations from approximate homomorphisms between expression spaces and their internal representations. Through controlled experiments on SCAN-style synthetic compositional tasks and small-scale Transformers, we demonstrate that HE serves as a strong predictor of out-of-distribution generalization performance, achieving $R^2 = 0.73$ correlation with OOD compositional generalization accuracy. Furthermore, our analysis reveals that model architecture has minimal impact on compositional structure, training data coverage exhibits threshold effects, but noise injection systematically degrades compositional representations in predictable ways. Importantly, we find that different aspects of compositionality—unary operations (modifiers) versus binary operations (sequence composition)—exhibit distinct sensitivities to distributional shifts, with modifier representations being particularly vulnerable to spurious correlations. These findings provide new mechanistic insights into compositional learning and establish homomorphism error as a valuable diagnostic tool for developing more robust neural architectures training methods. Code and data will be made publicaly available.

Submission Number: 142

Loading