TL;DR: We investigate the eigenvalues of the linear layers in deep networks and show that the distributions develop heavy-tail behavior during training.
Abstract: The linear transformations in converged deep networks show fast eigenvalue decay. The distribution of eigenvalues looks like a Heavy-tail distribution, where the vast majority of eigenvalues is small, but not actually zero, and only a few spikes of large eigenvalues exist.
We use a stochastic approximator to generate histograms of eigenvalues. This allows us to investigate layers with hundreds of thousands of dimensions. We show how the distributions change over the course of image net training, converging to a similar heavy-tail spectrum across all intermediate layers.
1 Reply
Loading