“No Free Lunch” in Neural Architectures? A Joint Analysis of Expressivity, Convergence, and Generalization

Wuyang Chen; Wei Huang; Zhangyang Wang

“No Free Lunch” in Neural Architectures? A Joint Analysis of Expressivity, Convergence, and Generalization

Wuyang Chen, Wei Huang, Zhangyang Wang

Published: 16 May 2023, Last Modified: 05 Sept 2023AutoML 2023 MainTrackReaders: Everyone

Keywords: Neural Network Architecture, Neural Architecture Search, Architecture Benchmarks, Convergence, Expressivity, Generalization

TL;DR: We jointly analyzed the expressivity, convergence, and generalization of deep networks, and discovered the “No Free Lunch” behavior in deep networks defined in an architecture space.

Abstract: The prosperity of deep learning and automated machine learning (AutoML) is largely rooted in the development of novel neural networks -- but what defines and controls the "goodness" of networks in an architecture space? Test accuracy, a golden standard in AutoML, is closely related to three aspects: (1) expressivity (how complicated functions a network can approximate over the training data); (2) convergence (how fast the network can reach low training error under gradient descent); (3) generalization (whether a trained network can be generalized from the training data to unseen samples with low test error). However, most previous theory papers focus on fixed model structures, largely ignoring sophisticated networks used in practice. To facilitate the interpretation and understanding of the architecture design by AutoML, we target connecting a bigger picture: how does the architecture jointly impact its expressivity, convergence, and generalization? We demonstrate the "no free lunch" behavior in networks from an architecture space: given a fixed budget on the number of parameters, there does not exist a single architecture that is optimal in all three aspects. In other words, separately optimizing expressivity, convergence, and generalization will achieve different networks in the architecture space. Our analysis can explain a wide range of observations in AutoML. Experiments on popular benchmarks confirm our theoretical analysis. Our codes are attached in the supplement.

Submission Checklist: Yes

Broader Impact Statement: Yes

Paper Availability And License: Yes

Code Of Conduct: Yes

Reviewers: Yes

CPU Hours: 800

GPU Hours: 800

TPU Hours: 0

Evaluation Metrics: No

Code And Dataset Supplement: zip

Estimated CO2e Footprint: 103.68

Steps For Environmental Footprint Reduction During Development: ### Prerequisites - Ubuntu 16.04 - Python 3.6.9 - CUDA 11.1 (lower versions may work but were not tested) - NVIDIA GPU + CuDNN v7.3 This code has been tested on V100 GPT. Configurations may need to be changed on different platforms. ### Installation * Install dependencies: ```bash pip install -r requirements.txt ``` * Download Tiny ImageNet (CIFAR-10 and CIFAR-100 will be automatically downloaded by Torchvision): https://gist.github.com/moskomule/2e6a9a463f50447beca4e64ab4699ac4

24 Replies

Loading