Keywords: compute-optimal scaling laws, geometric deep learning, interatomic potentials
TL;DR: We found “architecture-dependent” scaling exponents across architectures with increasing levels of symmetry expressivity in the area of learning interatomic potentials.
Abstract: We present an empirical study in the geometric task of learning interatomic potentials, which shows equivariance matters even more at larger scales; we show a
clear power-law scaling behaviour with respect to data, parameters and compute
with “architecture-dependent exponents”. In particular, we observe that equivariant
architectures, which leverage task symmetry, scale better than non-equivariant
models. Moreover, among equivariant architectures, higher-order representations
translate to better scaling exponents. Our analysis also suggests that for computeoptimal
training, the data and model sizes should scale in tandem regardless of the
architecture. At a high level, these results suggest that, contrary to common belief,
we should not leave it to the model to discover fundamental inductive biases such
as symmetry, especially as we scale, because they change the inherent difficulty
of the task and its scaling laws.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 9568
Loading