Keywords: Dynamical Isometry, Gradient Propagation, Deep Learning, Small-world Networks, Efficient Inference
Abstract: In this paper, we address two fundamental research questions in neural architecture design: (i) How does the architecture topology impact the gradient flow during training? (ii) Can certain topological characteristics of deep networks indicate a priori (i.e., without training) which models, with a different number of parameters/FLOPS/layers, achieve a similar accuracy? To this end, we formulate the problem of deep learning architecture design from a network science perspective and introduce a new metric called NN-Mass to quantify how effectively information flows through a given architecture. We establish a theoretical link between NN-Mass, a topological property of neural architectures, and gradient flow characteristics (e.g., Layerwise Dynamical Isometry). As such, NN-Mass can identify models with similar accuracy, despite having significantly different size/compute requirements. Detailed experiments on both synthetic and real datasets (e.g., MNIST, CIFAR-10, CIFAR-100, ImageNet) provide extensive evidence for our insights. Finally, we show that the closed-form equation of our theoretically grounded NN-Mass metric enables us to design efficient architectures directly without time-consuming training and searching.
One-sentence Summary: We quantify the impact of topological properties of deep networks on their gradient propagation and resulting performance.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Reviewed Version (pdf): https://openreview.net/references/pdf?id=A8jxIXx-s
4 Replies
Loading