Equivariant Architectures vs. Transformers: Or, What Gradient Overlaps Reveal About Memorization and Generalization
Keywords: Evaluation, Generalization, PDE, Surrogate
Abstract: We introduce a diagnostic for symmetry learning in PDE surrogates: the similarity between parameter gradients computed on symmetry related states. On compressible Euler flows, our diagnostic reveals that a UNet exhibits partial but unstable gradient coherence across square group actions and translations, whereas a ViT reaches lower prediction error yet shows largely orthogonal updates across orbits. This exposes an optimization-symmetry trade-off: stronger inductive biases promote data efficiency but can couple updates rigidly; flexible architectures optimize easily but ignore physical structure. Our diagnostic offers a reproducible test for whether training dynamics propagate information across symmetry orbits.
Submission Number: 104
Loading