Keywords: independent component analysis, feature learning, high dimensional statistics, FastICA, stochastic gradient descent
TL;DR: We demonstrate that independent component analysis, an unsupervised learning algorithm that learns filters similar to deep convolutional networks, can extract non-Gaussian features from high dimensional inputs at linear sample complexity.
Abstract: Feature learning at the scale of deep neural networks remains poorly understood due to the complexity of deep network dynamics. Independent component analysis (ICA) provides a simple unsupervised model for feature learning, as it learns filters that are similar to deep networks. ICA
extracts these features from the higher-order correlations of the inputs, which is a computationally
hard task in high dimensions with a sample complexity of at least $n ≳ D^2$
for $D$-dimensional
inputs. In practice, this difficulty is overcome by running ICA in the d-dimensional subspace
spanned by the leading principal components of the inputs, which is often taken to be $d = D/4$.
However, there exist no theoretical guarantees for this procedure. Here, we first conduct systematic
experiments on ImageNet to demonstrate that running FastICA in a finite subspace of $d ∼ O_D(1)$
dimensions yields non-Gaussian directions in the D-dimensional image space. We then introduce
a “subspace model” for synthetic data, and prove that FastICA does indeed recover the most nonGaussian direction in a sample complexity that is linear in the input dimension. We finally show
experimentally that deep convolutional networks trained on ImageNet exhibit behaviour consistent
with FastICA: during training, they converge to the principal subspace of image patches before
or when they find non-Gaussian directions. By providing quantitative, rigorous insights into the
working of FastICA, our study thus unveils a plausible feature-learning mechanism in deep convolutional neural networks.
Student Paper: Yes
Submission Number: 35
Loading