On Connecting Deep Trigonometric Networks with Deep Gaussian Processes: Covariance, Expressivity, and Neural Tangent Kernel
Abstract: Deep Gaussian Process (DGP) as a model prior in Bayesian learning intuitively exploits the expressive power of function composition. DGPs also offer diverse modeling capabil- ities, but inference is challenging because marginalization in latent function space is not tractable. With Bochner’s theorem, DGP with squared exponential kernel can be viewed as a deep trigonometric network consisting of the random feature layers, sine and cosine acti- vation units, and random weight layers. In the wide limit with a bottleneck, we show that the weight space view yields the same effective covariance functions which were obtained previously in function space. Also, varying the prior distributions over network parameters is equivalent to employing different kernels. As such, DGPs can be translated into the deep bottlenecked trigonometric networks, with which the exact maximum a posteriori estimate can be obtained. Interestingly, the network representation enables the study of DGP’s neu- ral tangent kernel, which may also reveal the mean of the intractable predictive distribution. Statistically, unlike the shallow networks, deep networks of finite width have covariance de- viating from the limiting kernel, and the inner and outer widths may play different roles in feature learning. Numerical simulations are presented to support our findings.
Submission Length: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=RljcyKWY3K&referrer=%5BTMLR%5D(%2Fgroup%3Fid%3DTMLR)
Changes Since Last Submission: The font has been changed to match the latest TMLR format.
Assigned Action Editor: ~Jaehoon_Lee2
Submission Number: 323
Loading