Keywords: compositionality, causality, counterfactuals, compositional generalization, ood
TL;DR: A key difference between the causal and compositional view on generalization is that counterfactuals assign a non-zero probability only to compositions that adhere to a prior causal world model.
Abstract: Large models trained on vast data sets can achieve both minimal training and test loss, and, thus, generalize statistically.
However, their interesting properties such as good transfer performance or extrapolation
concern out-of-distribution (OOD) data.
One desired OOD property is compositional generalization, when models generalize to unseen feature combinations.
While compositional generalization promises good performance on a wide range of OOD scenarios, it does not account for the plausibility of such combinations.
A ubiquitous example is hallucinations in large models.
Building on recent advances in Bayesian causal inference, we propose a unified perspective of counterfactual and compositional generalization. We use a causal world model to reason about the plausibility of unseen combinations.
By introducing a Bayesian prior, we show that counterfactual generalization is a special case of compositionality, restricted to realistic combinations.
This perspective allows us to formally characterize hallucinations,
and opens up new research directions to equip generative AI models with a formally motivated ``switch" between realistic and non-realistic/creative modes.
Submission Number: 14
Loading