Neural Networks with Block Diagonal Inner Product Layers


Nov 03, 2017 (modified: Nov 03, 2017) ICLR 2018 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: Two difficulties continue to burden deep learning researchers and users: (1) neural networks are cumbersome tools that grow with the complexity of the learning problem, and (2) the activity of the fully connected, or inner product, layers remains mysterious. We make contributions to these two issues by considering a modified version of the fully connected layer we call a block diagonal inner product layer. These modified layers have weight matrices that are block diagonal, turning a single fully connected layer into a set of densely connected neuron groups. This method condenses network storage and speeds up the run time without significant adverse effect on the testing accuracy, thus offering a new approach to solving the first problem. Comparing the change in variance and singular values of the weights through training in a layer when varying the number of blocks gives insight into the second problem. The ratio of the variance of the weights remains constant throughout training. That is, the relationship in structure is preserved in the final parameter distribution. We observe that trained inner product layers have structure similar to that of truly random matrices with iid entries, and that each block in a block inner product layer behaves like a smaller copy, giving a better understanding of the nature of inner product layers.
  • TL;DR: We look at neural networks with block diagonal inner product layers for efficiency and offer some analysis.
  • Keywords: Deep Learning, Neural Networks, Random Matrix Theory