TL;DR: We jointly analyzed the expressivity, convergence, and generalization of deep networks, and discovered the “No Free Lunch” behavior in deep networks defined in an architecture space.
Abstract: The prosperity of deep learning and automated machine learning (AutoML) is largely rooted in the development of novel neural networks -- but what defines and controls the "goodness" of networks in an architecture space? Test accuracy, a golden standard in AutoML, is closely related to three aspects: (1) expressivity (how complicated functions a network can approximate over the training data); (2) convergence (how fast the network can reach low training error under gradient descent); (3) generalization (whether a trained network can be generalized from the training data to unseen samples with low test error). However, most previous theory papers focus on fixed model structures, largely ignoring sophisticated networks used in practice. To facilitate the interpretation and understanding of the architecture design by AutoML, we target connecting a bigger picture: how does the architecture jointly impact its expressivity, convergence, and generalization? We demonstrate the "no free lunch" behavior in networks from an architecture space: given a fixed budget on the number of parameters, there does not exist a single architecture that is optimal in all three aspects. In other words, separately optimizing expressivity, convergence, and generalization will achieve different networks in the architecture space. Our analysis can explain a wide range of observations in AutoML. Experiments on popular benchmarks confirm our theoretical analysis. Our codes are attached in the supplement.
Keywords: Neural Network Architecture, Neural Architecture Search, Architecture Benchmarks, Convergence, Expressivity, Generalization
Submission Checklist: Yes
Broader Impact Statement: Yes
Paper Availability And License: Yes
Code Of Conduct: Yes
Code And Dataset Supplement: zip
Steps For Environmental Footprint Reduction During Development: ### Prerequisites - Ubuntu 16.04 - Python 3.6.9 - CUDA 11.1 (lower versions may work but were not tested) - NVIDIA GPU + CuDNN v7.3 This code has been tested on V100 GPT. Configurations may need to be changed on different platforms. ### Installation * Install dependencies: ```bash pip install -r requirements.txt ``` * Download Tiny ImageNet (CIFAR-10 and CIFAR-100 will be automatically downloaded by Torchvision): https://gist.github.com/moskomule/2e6a9a463f50447beca4e64ab4699ac4
CPU Hours: 800
GPU Hours: 800
TPU Hours: 0
Evaluation Metrics: No
Estimated CO2e Footprint: 103.68