Growing Tiny Networks: Spotting Expressivity Bottlenecks and Fixing Them Optimally

20 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: optimization
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: neural network, expressivity, optimization, quadratic programmings, layer growth
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We quantify the concept of lack of expressivity in neural networks and propose an algorithm to fix them by appropriately adding neurons.
Abstract: Machine learning tasks are generally formulated as optimization problems, where one searches for an optimal function within a certain functional space. In practice, parameterized functional spaces are considered, in order to be able to perform gradient descent. Neural networks need first to choose and fix an architecture (number and type of layers), then to optimize its parameters (connection weights). Any changes on layer structure requires to train again the network. This process restricts the search in the functional space within the realm of what is expressible by the chosen architecture. The common solution is a costly and slow hyper-parameter optimization regarding the architectural choice to crudely address this expressivity bottleneck. With a careful characterization of the neural networks expressivity, we show that the information about desirable architectural changes can be directly extracted during the backpropagation pass. To do this, we propose a new mathematically well-grounded method to detect expressivity bottlenecks \emph{on the fly} and solve them by adding suitable neurons when and where needed. Thus, while the standard approach requires large networks, in terms of number of neurons per layer, for expressivity and optimization reasons, we are able to start with very small neural networks and let them grow appropriately. As a proof of concept, we show results~on the CIFAR dataset, matching large neural network accuracy, with competitive training time,while removing the need for standard architectural hyper-parameter search.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: pdf
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2716
Loading