On the selection of neural architectures from a supernetDownload PDF

Published: 16 May 2023, Last Modified: 15 Sept 2023AutoML 2023 WorkshopReaders: Everyone
Abstract: After DARTS provided a method utilizing a supernet to search for neural network architectures entirely through gradient descent, differentiable supernet-based methods emerged as a powerful and popular approach to efficient neural architecture search (NAS). Following works improved upon many aspects of the DARTS algorithm but generally kept the original method of selecting the final architecture, pruning the lowest magnitude architecture weights. though critiques of this approach have led to alternative architecture selection mechanisms, such as a perturbation-based method. Here we perform a broad comparative evaluation of architecture selection methods in combination with different techniques for training the supernet, and highlight the interdependence between various methods for supernet training and architecture selection mechanisms. We show the potential for improved results for many NAS supernet training methods via alternate architecture selection mechanisms relative to the pruning-based architecture selection they were introduced, and are typically evaluated, with. In evaluating architecture selection methods, we also demonstrate how zero-shot NAS methods may be effectively integrated into supernet NAS training as new architecture selection mechanisms.
Keywords: neural architecture search, meta-analysis, evaluation, zero-shot NAS, differentiable NAS, supernet
Submission Checklist: Yes
Broader Impact Statement: Yes
Paper Availability And License: Yes
Code Of Conduct: Yes
Reviewers: Yes
Steps For Environmental Footprint Reduction During Development: The entirety of experimental design and development was conducted only using NAS benchmarks. Further experiments were implemented in a larger search space only once experimental designs were were tested in the benchmark in order minimize the unnecessary training of neural networks. All initial experiments during development were conducted with the lowest number of training epochs reported. Checkpointing was implemented within neural network training scripts so that any runs which failed did not have to be restarted from scratch.
CPU Hours: 0.5
GPU Hours: 25400
TPU Hours: 0
Evaluation Metrics: No
Estimated CO2e Footprint: 3292
13 Replies