Keywords: Geometric deep learning, equivariance, neural scaling laws
TL;DR: We study empirically how equivariant and non-equivariant networks scale with compute and training samples.
Abstract: Given large data sets and sufficient compute, is it beneficial to design neural architectures for the structure and symmetries of a problem, or is it more efficient to learn them from data? We study empirically how equivariant and non-equivariant networks scale with compute and training samples. Focusing on a benchmark problem of rigid-body interactions and general-purpose transformer architectures, we perform a series of experiments, varying the model size, training steps, and dataset size. We find evidence for three conclusions. First, equivariance improves data efficiency, but training non-equivariant models with data augmentation closes this gap. Second, scaling with compute follows a power law, with equivariant models outperforming non-equivariant ones at each tested compute budget. Finally, the optimal allocation of a compute budget onto model size and training duration differs between equivariant and non-equivariant models.
Submission Number: 23
Loading