Abstract: Neural architecture search (NAS) seeks to automate neural network design to optimize performance criteria, but designing a search space for NAS largely remains a manual effort. When available, strong prior knowledge can be used to construct small search spaces, but using such spaces inevitably limits the flexibility of NAS, and prior information is not always available on novel tasks and/or architectures.
On the other hand, many NAS methods have been shown to be sensitive to the choice of search space and struggle when the search space is not sufficiently refined. To address this problem, we propose a differentiable technique that learns a policy to refine a broad initial search space during supernet training. Our proposed solution is orthogonal to almost all existing improvements to NAS pipelines, is largely search space-agnostic, and incurs little additional overhead beyond standard supernet training. Despite its simplicity, we show that on tasks without strong priors, our solution consistently discovers performant subspaces within an initially large, complex search space (where even the state-of-the-art methods underperform), significantly robustifies the resultant supernet and improves the performance across a wide range model sizes. We argue that our work takes a step toward full automation of the network design pipeline.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: We've highlighted all changes and additions in blue text in the revised PDF.
Summary of changes:
- Added more high-level, intuitive explanations of our method.
- Explanation of the ability of the proposed method to expand search spaces.
- Added a note on the potential applicability of the proposed method in high-cardinality or continuous search spaces.
- Additional explanations on how the search spaces in this paper differ from previous works.
- Discussions of additional related works, such as NAT and NSGAv2.
- Plots showing the evolution of Pareto fronts.
- Quantification of the differences between BL and baseline via hypervolumes.
- Comparison against additional baselines (AttentiveNAS and AlphaNet) and experiments showing the combination of these methods with boundary learning.
- Additional ablations, including the robustness of the method 1) under a different optimizer choice (AdamW) and 2) different random seeds.
- Misc changes suggested by the reviewers, such as previously unclear explanations and correction of typo errors.
Assigned Action Editor: ~Elliot_Meyerson1
Submission Number: 1487
Loading