Keywords: Neural Architecture Search, Zero Cost Neural Architecture Search, Efficient NAS, Self-supervised Learning, Computer Vision
TL;DR: We proposed a configurable and extensible efficient proxy task for evaluating neural architectures with a search method to extend the proxy to various downstream tasks and search spaces.
Abstract: Neural Architecture Search (NAS) has become a de facto approach in the recent trend of AutoML to design deep neural networks (DNNs). Efficient or near-zero-cost NAS proxies are further proposed to address the demanding computational issues of NAS, where each candidate architecture network only requires one iteration of backpropagation. The values obtained from the proxies are considered the predictions of architecture performance on downstream tasks. However, two significant drawbacks hinder the extended usage of Efficient NAS proxies. (1) Efficient proxies are not adaptive to various search spaces. (2) Efficient proxies are not extensible to multi-modality downstream tasks. Based on the observations, we design a Extensible proxy (Eproxy) that utilizes self-supervised, few-shot training (i.e., 10 iterations of backpropagation) which yields near-zero costs. The key component that makes Eproxy efficient is an untrainable convolution layer termed barrier layer that add the non-linearities to the optimization spaces so that the Eproxy can discriminate the performance of architectures in the early stage. Furthermore, to make Eproxy adaptive to different downstream tasks/search spaces, we propose a Discrete Proxy Search (DPS) to find the optimized training settings for Eproxy with only handful of benchmarked architectures on the target tasks. Our extensive experiments confirm the effectiveness of both Eproxy and Eproxy+DPS. On NAS-Bench-101 (~423k architectures), Eproxy achieves 0.65 as the spearman rho. In contrast, the previous best zero-cost method achieves 0.45. On NDS-ImageNet search spaces, Eproxy+DPS delivers 0.73 Spearman $\rho$ average ranking correlation while the previous efficient proxy only achieves 0.47. On NAS-Bench-Trans-Micro search space (7 tasks), Eproxy+DPS delivers comparable performance with early stop methods which requires 660 GPU hours per task. For the end-to-end task such as DARTS-ImageNet-1k, our method delivers better results compared to NAS performed on CIFAR-10 while only requiring a GPU hour with a single batch of CIFAR-10 images.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning
4 Replies
Loading