Memorization and Optimization in Deep Neural Networks with Minimum Over-parameterization

Simone Bombari; Mohammad Hossein Amani; Marco Mondelli

Memorization and Optimization in Deep Neural Networks with Minimum Over-parameterization

Simone Bombari, Mohammad Hossein Amani, Marco Mondelli

Published: 31 Oct 2022, Last Modified: 06 Apr 2025NeurIPS 2022 AcceptReaders: Everyone

Keywords: deep neural networks, Neural Tangent Kernel, minimum over-parameterization, memorization capacity, gradient descent training

TL;DR: We show that the NTK is well-conditioned for deep neural networks with minimum possible over-parameterization ($\Omega(N)$ parameters and, hence, $\Omega(\sqrt{N})$ neurons -- $N$ being the number of training samples).

Abstract: The Neural Tangent Kernel (NTK) has emerged as a powerful tool to provide memorization, optimization and generalization guarantees in deep neural networks. A line of work has studied the NTK spectrum for two-layer and deep networks with at least a layer with $\Omega(N)$ neurons, $N$ being the number of training samples. Furthermore, there is increasing evidence suggesting that deep networks with sub-linear layer widths are powerful memorizers and optimizers, as long as the number of parameters exceeds the number of samples. Thus, a natural open question is whether the NTK is well conditioned in such a challenging sub-linear setup. In this paper, we answer this question in the affirmative. Our key technical contribution is a lower bound on the smallest NTK eigenvalue for deep networks with the minimum possible over-parameterization: up to logarithmic factors, the number of parameters is $\Omega(N)$ and, hence, the number of neurons is as little as $\Omega(\sqrt{N})$. To showcase the applicability of our NTK bounds, we provide two results concerning memorization capacity and optimization guarantees for gradient descent training.

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/memorization-and-optimization-in-deep-neural/code)

13 Replies

Loading