On the Statistical Mechanisms of Distributional Compositional Generalization

Jingwen Fu; Nanning Zheng

On the Statistical Mechanisms of Distributional Compositional Generalization

Jingwen Fu, Nanning Zheng

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Distributional Compositional Generalization (DCG) refers to the ability to tackle tasks from new distributions by leveraging the knowledge of concepts learned from supporting distributions. In this work, we aim to explore the statistical mechanisms of DCG, which have been largely overlooked in previous studies. By statistically formulating the problem, this paper seeks to address two key research questions: 1) Can a method to one DCG problem be applicable to another? 2) What statistical properties can indicate a learning algorithm's capacity for knowledge composition in DCG tasks? \textbf{To address the first question}, an invariant measure is proposed to provide a dimension where all different methods converge. This measure underscores the critical role of data in enabling improvements without trade-offs. \textbf{As for the second question}, we reveal that by decoupling the impacts of insufficient data and knowledge composition, the ability of the learning algorithm to compose knowledge relies on the compatibility and sensitivity between the learning algorithm and the composition rule. In summary, the statistical analysis of the generalization mechanisms provided in this paper deepens our understanding of compositional generalization, offering a complementary evidence on the importance of data in DCG task.

Lay Summary: (1) This study investigates the statistical mechanisms of Distributional Compositional Generalization (DCG), an area that has received limited attention in prior research. (2) We focus on two central research questions: (a) Can a solution to one DCG task be effectively transferred to another? (b) What statistical characteristics reveal a learning algorithm’s ability to perform knowledge composition in DCG scenarios? (3) Our findings highlight the pivotal role of data in enabling performance gains without trade-offs. Furthermore, a learning algorithm's capacity for knowledge composition depends on the compatibility and sensitivity between the algorithm and the underlying composition rule.

Primary Area: Theory->Learning Theory

Keywords: Compositional Generalization

Submission Number: 5565

Loading