Keywords: compositional generalization, compositionality, disentanglement, representation learning, computer vision
TL;DR: We introduce a novel, scalable framework to evaluate compositional generalization, leverage it to evaluate more than 5k models, and propose a family of neural models pushing the Pareto frontier on this task.
Abstract: Compositional generalization—a key open challenge in modern machine learning—requires models to predict unknown combinations of known concepts. However, assessing compositional generalization remains a fundamental challenge due to the lack of standardized evaluation protocols and the limitations of current benchmarks, which often favor efficiency over rigor. At the same time, general-purpose vision architectures lack the necessary inductive biases, and existing approaches to endow them compromise scalability. As a remedy, this paper introduces: 1) a rigorous evaluation framework that unifies and extends previous approaches while reducing computational requirements from combinatorial to constant; 2) an extensive and modern evaluation on the status of compositional generalization in supervised vision backbones, training more than 5000 models; 3) Attribute Invariant Networks, a class of models establishing a new Pareto frontier in compositional generalization, achieving a 23.43% accuracy improvement over baselines while reducing parameter overhead from 600% to 16% compared to fully disentangled counterparts.
Supplementary Material: zip
Primary Area: Evaluation (e.g., methodology, meta studies, replicability and validity, human-in-the-loop)
Submission Number: 28258
Loading