S-System, Geometry, Learning, and Optimization: A Theory of Neural NetworksDownload PDF

27 Sep 2018 (modified: 21 Dec 2018)ICLR 2019 Conference Blind SubmissionReaders: Everyone
  • Abstract: We present a formal measure-theoretical theory of neural networks (NN) built on {\it probability coupling theory}. Particularly, we present an algorithm framework, Hierarchical Measure Group and Approximate System (HMGAS), nicknamed S-System, of which NNs are special cases. In addition to many other results, the framework enables us to prove that 1) NNs implement {\it renormalization group (RG)} using information geometry, which points out that the large scale property to renormalize is dual Bregman divergence and completes the analog between NNs and RG; 2) and under a set of {\it realistic} boundedness and diversity conditions, for {\it large size nonlinear deep} NNs with a class of losses, including the hinge loss, all local minima are global minima with zero loss errors, using random matrix theory.
  • Keywords: neural network theory, probability measure theory, probability coupling theory, S-System, optimization, random matrix, renormalization group, information geometry, coarse graining, hierarchy, activation function, symmetry
  • TL;DR: We present a formal measure-theoretical theory of neural networks (NN) that quantitatively shows NNs renormalize on semantic difference, and under practical conditions large size deep nonlinear NNs can optimize objective functions to zero losses.
3 Replies