How Structured Data Guides Feature Learning: A Case Study of the Parity Problem

Published: 07 Nov 2023, Last Modified: 13 Dec 2023M3L 2023 PosterEveryoneRevisionsBibTeX
Keywords: neural network optimization, feature learning, mean-field Langevin dynamics
TL;DR: We studied the interplay between structured data (in the form of input anisotropy) and the efficiency of feature learning by considering classification of the k-sparse parity using two-layer neural network optimized by noisy gradient descent.
Abstract: Recent works have shown that neural networks optimized by gradient-based methods can adapt to sparse or low-dimensional target functions through feature learning; an often studied target is classification of the sparse parity function on the unit hypercube. However, such isotropic data setting does not capture the anisotropy and low intrinsic dimensionality exhibited in realistic datasets. In this work, we address this shortcoming by studying how feature learning interacts with structured (anisotropic) input data: we consider the classification of sparse parity on high-dimensional orthotope where the feature coordinates have varying magnitudes. Specifically, we analyze the learning complexity of the mean-field Langevin dynamics (MFLD), which describes the noisy gradient descent update on two-layer neural network, and show that the statistical complexity (i.e. sample size) and computational complexity (i.e. network width) of MFLD can both be improved when prominent directions of the anisotropic input data aligns with the support of the target function. Moreover, we demonstrate the benefit of feature learning by establishing a kernel lower bound on the classification error, which applies to neural networks in the lazy regime.
Submission Number: 84