Learning from higher-order correlations, efficiently: hypothesis tests, random features, and neural networks
Keywords: higher-order cumulant, hypothesis test, neural network, random features, low-degree method
TL;DR: We analyse the statistical-to-computational gap in learning from higher-order data correlations and show that neural networks learn these correlations more efficiently than kernel methods.
Abstract: Neural networks excel at discovering statistical patterns in
high-dimensional data sets. In practice, higher-order cumulants, which quantify
the non-Gaussian correlations between three or more variables, are particularly
important for the performance of neural networks. But how efficient are neural
networks at extracting features from higher-order cumulants? We study this
question in the spiked cumulant model, where the statistician needs to recover a
privileged direction or "spike'' from the order-$p\ge 4$ cumulants
of $d$-dimensional inputs.
We first discuss the fundamental statistical and
computational limits of recovering the spike by analysing the number of
samples $n$ required to strongly distinguish between inputs from the spiked
cumulant model and isotropic Gaussian inputs.
Existing literature established the presence of a wide statistical-to-computational gap in this problem. We deepen this line of work by finding an exact formula for the likelihood ratio norm which proves that statistical
distinguishability requires $n\gtrsim d$ samples, while distinguishing the two
distributions in polynomial time requires $n \gtrsim d^2$ samples for a wide
class of algorithms, i.e. those covered by the low-degree conjecture.
Numerical experiments show that neural networks do indeed learn to distinguish
the two distributions with quadratic sample complexity, while ``lazy'' methods
like random features are not better than random guessing in this regime. Our
results show that neural networks extract information from higher-order
correlations in the spiked cumulant model efficiently, and reveal a large gap in
the amount of data required by neural networks and random features to learn from
higher-order cumulants.
Primary Area: Learning theory
Submission Number: 17639
Loading