TL;DR: We introduce NTKs for group convolutional neural networks, establish a relation between data augmentation and GCNNs, and show that equivariant NTKs outperform non-equivariant NTKs in a number of application domains.
Abstract: Little is known about the training dynamics of equivariant neural networks, in particular how it compares to data augmented training of their non-equivariant counterparts. Recently, neural tangent kernels (NTKs) have emerged as a powerful tool to analytically study the training dynamics of wide neural networks. In this work, we take an important step towards a theoretical understanding of training dynamics of equivariant models by deriving neural tangent kernels for a broad class of equivariant architectures based on group convolutions. As a demonstration of the capabilities of our framework, we show an interesting relationship between data augmentation and group convolutional networks. Specifically, we prove that they share the same expected prediction over initializations at all training times and even off the data manifold. In this sense, they have the same training dynamics. We demonstrate in numerical experiments that this still holds approximately for finite-width ensembles. By implementing equivariant NTKs for roto-translations in the plane ($G=C_{n}\ltimes\mathbb{R}^{2}$) and 3d rotations ($G=\mathrm{SO}(3)$), we show that equivariant NTKs outperform their non-equivariant counterparts as kernel predictors for histological image classification and quantum mechanical property prediction.
Lay Summary: A significant proportion of machine learning problems are subject to inherent symmetries, e.g. translation or rotation symmetry in image classification. This underlying property can either be learned by means of data augmentation or enforced through the model structure itself, so-called equivariant networks. However, the training dynamics of the latter are not understood well, rendering a systematic comparison between those approaches difficult.
In this work, we extend a mathematical tool, called the neural tangent kernel (NTK) to equivariant networks. In the regime of wide hidden layers, it allows for an analytic solution of the training dynamics and has already been applied with great success on simpler models. We use our extension to find an explicit connection between data augmentation and equivariance: For a particular class of conventional networks trained with data augmentation, we find corresponding equivariant networks that share the same expected training dynamics in the limit of infinitely wide hidden layers. We further implement the general analytic relations we found for rotation and translation symmetries.
The presented framework and its implications contribute to the ongoing debate on whether symmetry should be enforced by construction or learned from data by finding an explicit correspondence between those two approaches.
Link To Code: https://github.com/PhilippMisofCH/equivariant-ntk
Primary Area: Theory->Deep Learning
Keywords: neural tangent kernel, equivariant neural networks, geometric deep learning, data augmentation
Submission Number: 11295
Loading