Learning from higher-order correlations, efficiently: hypothesis tests, random features, and neural networks

Eszter Szekely; Lorenzo Bardone; Federica Gerace; Sebastian Goldt

Learning from higher-order correlations, efficiently: hypothesis tests, random features, and neural networks

Eszter Szekely, Lorenzo Bardone, Federica Gerace, Sebastian Goldt

Published: 25 Sept 2024, Last Modified: 06 Nov 2024NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: higher-order cumulant, hypothesis test, neural network, random features, low-degree method

TL;DR: We analyse the statistical-to-computational gap in learning from higher-order data correlations and show that neural networks learn these correlations more efficiently than kernel methods.

Abstract: Neural networks excel at discovering statistical patterns in high-dimensional data sets. In practice, higher-order cumulants, which quantify the non-Gaussian correlations between three or more variables, are particularly important for the performance of neural networks. But how efficient are neural networks at extracting features from higher-order cumulants? We study this question in the spiked cumulant model, where the statistician needs to recover a privileged direction or "spike'' from the order-$p\ge 4$ cumulants of $d$-dimensional inputs. We first discuss the fundamental statistical and computational limits of recovering the spike by analysing the number of samples $n$ required to strongly distinguish between inputs from the spiked cumulant model and isotropic Gaussian inputs. Existing literature established the presence of a wide statistical-to-computational gap in this problem. We deepen this line of work by finding an exact formula for the likelihood ratio norm which proves that statistical distinguishability requires $n\gtrsim d$ samples, while distinguishing the two distributions in polynomial time requires $n \gtrsim d^2$ samples for a wide class of algorithms, i.e. those covered by the low-degree conjecture. Numerical experiments show that neural networks do indeed learn to distinguish the two distributions with quadratic sample complexity, while ``lazy'' methods like random features are not better than random guessing in this regime. Our results show that neural networks extract information from higher-order correlations in the spiked cumulant model efficiently, and reveal a large gap in the amount of data required by neural networks and random features to learn from higher-order cumulants.

Primary Area: Learning theory

Submission Number: 17639

Loading