Abstract: Compositional generalization tests are often used to estimate the compositionality of LLMs. However, compositional generalization tests (1) do not focus on the explanations of LLMs for their fitted functions and (2) use consistency with a fixed function on a pre-partitioned test set as a criterion, hindering the acquisition of explainable and convincing estimation and analysis of the compositionality of LLMs. In this work, we propose a program-generation perspective that takes the programs generated by LLMs as externalized explanations and provides estimates of the compositionality of LLMs with the help of complexity-based theory. The perspective addresses the explainability limitations of compositional generalization tests and provides a new way to analyze the compositionality characterization of LLMs. We conduct experiments and analysis of existing advanced LLMs based on this perspective on a string-to-grid task, and find various compositionality characterizations and compositionality deficiencies exhibited by LLMs.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: compositionality
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
Submission Number: 156
Loading